-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smart Iteration Algorithms #17
Comments
Hi @adamchainz I know it's a bit of an old ticket, but I'd be interested in working on this. Many thanks for your great work! |
I think you really need a big table with realistic fragmentation to investigate this properly. Also look in the source code for Are you having any problems with the current algorithm? |
We ran into some inefficiency when dealing with queries that have sparse results over big tables. This was due to the query conditions but also due to removed rows (with ids earlier in the table being more sparse for example). |
Yeah sparse results are difficult because there's no predicting when the density of results changes. What the current algorithm does is just react to this by decreasing the chunk size rapidly. |
The current algorithm used by
SmartChunkedIterator
is currently restricted to the one frompt-online-schema-change
, with filtering added. This works badly for really sparse distributions of results.There are other algorithms we can use, in fact listed on the
pt-table-sync
documentation. For example, the current algorithm is (approximately) there as 'chunk'. The 'nibble' strategy would work quite well for sparse queries - it's the same aspt-archiver
, usingLIMIT
to determine how many get fetched.algorithm
could be another argument toSmartChunkedIterator
- maybe even defaulting to 'auto' with some heuristic to pick between the algos.The text was updated successfully, but these errors were encountered: