Smart Iteration Algorithms #17

adamchainz · 2015-03-21T17:53:58Z

The current algorithm used by SmartChunkedIterator is currently restricted to the one from pt-online-schema-change, with filtering added. This works badly for really sparse distributions of results.

There are other algorithms we can use, in fact listed on the pt-table-sync documentation. For example, the current algorithm is (approximately) there as 'chunk'. The 'nibble' strategy would work quite well for sparse queries - it's the same as pt-archiver, using LIMIT to determine how many get fetched.

algorithm could be another argument to SmartChunkedIterator - maybe even defaulting to 'auto' with some heuristic to pick between the algos.

The text was updated successfully, but these errors were encountered:

Geekfish · 2018-07-27T10:49:04Z

Hi @adamchainz

I know it's a bit of an old ticket, but I'd be interested in working on this.
Would you have any suggestions on a setup to test the efficiency of difference solutions outside of unit tests?

Many thanks for your great work!

adamchainz · 2018-07-27T11:16:32Z

I think you really need a big table with realistic fragmentation to investigate this properly. Also look in the source code for pt-table-sync.

Are you having any problems with the current algorithm?

Geekfish · 2018-07-27T13:04:23Z

We ran into some inefficiency when dealing with queries that have sparse results over big tables. This was due to the query conditions but also due to removed rows (with ids earlier in the table being more sparse for example).
We have found some workarounds for our cases (ex. not using chunks at all for queries with smaller counts) and the iterator still works nicely most of the time, but I was personally interested in a better solution for sparse distributions.

adamchainz · 2018-07-27T13:36:33Z

Yeah sparse results are difficult because there's no predicting when the density of results changes. What the current algorithm does is just react to this by decreasing the chunk size rapidly.

adamchainz added help wanted feature labels May 1, 2015

adamchainz changed the title ~~pt-archiver style iteration~~ Smart Iteration Algorithms Dec 8, 2015

adamchainz removed the help wanted label May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart Iteration Algorithms #17

Smart Iteration Algorithms #17

adamchainz commented Mar 21, 2015

Geekfish commented Jul 27, 2018

adamchainz commented Jul 27, 2018

Geekfish commented Jul 27, 2018

adamchainz commented Jul 27, 2018

Smart Iteration Algorithms #17

Smart Iteration Algorithms #17

Comments

adamchainz commented Mar 21, 2015

Geekfish commented Jul 27, 2018

adamchainz commented Jul 27, 2018

Geekfish commented Jul 27, 2018

adamchainz commented Jul 27, 2018