Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore improvement: batch #4055

Merged
merged 9 commits into from
Oct 7, 2024
Merged

Restore improvement: batch #4055

merged 9 commits into from
Oct 7, 2024

Conversation

Michal-Leszczynski
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski commented Oct 2, 2024

This PR introduces the following changes to batching mechanism:

Fixes #3979
Fixes #4059

This was the initial design, that was changed during implementation by mistake.
We should always send shard_cnt * --batch-size sstables in a single restore job,
even when user wants to restore into a running cluster. For this case, it should
be enough to set --parallel=1 and --batch-size=1 for a slow running restore.
@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/restore-batch branch 2 times, most recently from 0447b3d to 03856a9 Compare October 3, 2024 11:29
This commit allows to set --batch-size=0.
When this happens, batches will be created so that they contain
about 5% of expected node workload during restore.
This allows for creating big, yet evenly distributed batches
without the need to play with the --batch-size flag.
It should also work better fine when backed up cluster
had different amount of nodes than the restore destination
cluster.

Fixes #4059
This shouldn't be possible, but we can still validate that.
This results in creating batches of sstables of
more similar size.

Fixes #3979
@Michal-Leszczynski Michal-Leszczynski marked this pull request as ready for review October 3, 2024 11:58
@Michal-Leszczynski
Copy link
Collaborator Author

@karol-kokoszka This PR is ready for review!

pkg/service/restore/batch.go Show resolved Hide resolved
pkg/service/restore/batch.go Show resolved Hide resolved
This allows not to bother with small,
badly distributed over shards leftover batches.
@Michal-Leszczynski
Copy link
Collaborator Author

@karol-kokoszka I addressed the comments, could you take a look again?

@Michal-Leszczynski Michal-Leszczynski merged commit 78982d3 into master Oct 7, 2024
52 checks passed
@Michal-Leszczynski Michal-Leszczynski deleted the ml/restore-batch branch October 7, 2024 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch sstables basing on percentage of the node's data Batch sstables of similar size
2 participants