-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Count is wrong if the data set is huge. #60
Comments
I wanted to write a fairly detailed explanation incase people don't understand what is happening. You can get an estimated count of items in the table by using describe. This could be implemented in this library with an interface like The only way to know the exact count of items in the dynamo table is to scan the table. This is a limitation of dynamo not this library. The https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html "If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation." "One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second." https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.RetryAndBackoff https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DescribeTable.html |
I agree with @Gerst20051, this doesn't seem like we can fix easily. The number you get is the number of items scanned in that query, not from the entire table. We could implement something like a If you really need the current value in real-time, you could attach a stream to the table and keep a counter somewhere in a It's a little bit more work to setup, but if you use the table name as partition key, you can get the number of records in milliseconds, even for tables with millions of records. |
I have around 300,000 records in my table.
I want to get the count of records in my table. Let's stick to no filters for now.
I tried three methods:
Method 1:
The result in this query is just a number.
Expectation: I expect to get the correct count which is
300000
.Actual:: I get a partial count as an integer value:
2683
Method 2:
Expectation: I would expect this to return the raw object along with the
ScannedCount
,Count
,LastEvaluatedKey
, ...Actual:: I get an integer value:
2683
Method 3:
I see the only way of doing this is by fetching the raw result set as shown in the Pagination example and iterating through and fetching whole of the data and summing up the
Count
attribute.like so:
Doing repeatedly this untill there is no
LastEvaluatedKey
and summing up the counts will give me accurate count.BUUUUUTT This is not optimal, as it is fetching the entirety of the data set when I only need a count.
Maybe fix either Method 1 or Method 2?
The text was updated successfully, but these errors were encountered: