-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Scylla API for backup #4169
Open
Michal-Leszczynski
wants to merge
9
commits into
ml/scylla-api
Choose a base branch
from
ml/backup-scylla-api
base: ml/scylla-api
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+606
−52
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
abbfaed
to
b51c8b6
Compare
For Scylla to access object storage, it needs to be configured in the 'object_storage.yaml' config file.
A separate column for Scylla task ID is needed because: - it has a different type from agent job ID - it make it clear which API was used
Those methods consist of both: - direct Scylla backup API call - helper Scylla Task Manager API calls
When working with Rclone, SM specifies just the provider name, and Rclone (with agent config) resolves it internally to the correct endpoint. This made it so user didn't need to specify the exact endpoint when running SM backup/restore tasks. When working with Scylla, SM needs to specify resolved host name on its own. This should be the same name as specified in 'object_storage.yaml' (See https://github.com/scylladb/scylladb/blob/92db2eca0b8ab0a4fa2571666a7fe2d2b07c697b/docs/dev/object_storage.md?plain=1#L29-L39). In order to maximize compatibility and UX, we still want it to be possible to specify just the provider name when running backup/restore. In such case, SM sends provider name as the "endpoint" query param, which is resolved by agent to proper host name when forwarding request to Scylla. Different "endpoint" query params are not resolved. Note that resolving "endpoint" query param in the proxy is just for the UX, so it might not work correctly in all the cases. In order to ensure correctness, "endpoint" should be specified directly by SM user so that no resolving is needed.
c7a4ca0
to
7022961
Compare
Scylla backup API can be used when: - node exposes Scylla backup API - s3 is the used provider - backup won't create versioned files
Some tests used interceptor for given paths in order to wait/block/check some API calls. Those interceptors were updated to also look for Scylla backup API paths.
Using Scylla backup API does not result in changes to Rclone transfers, rate limiting or cpu pinning, so it shouldn't be checked as a part of the restore test.
This is a simple test for checking whether the correct API is used during the backup.
7022961
to
6a37901
Compare
@karol-kokoszka @VAveryanov8 so the idea is that the |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR starts using Scylla backup API in SM backup task!
It is mostly complete and can be tested, but there are 3 issues that were discovered during development:
scylla-manager-agent.yaml
andobject-storage.yaml
. It should be enough for default setups and tests, but we will be on the safe side when we fix this issue. Take a look at bd0cf1a for more info.In terms of the general overview of this PR - the main objective was to fix replace the
/agent/rclone/sync/movedir
Rclone API with the/storage_service/backup
Scylla API - nothing more.Scylla API can be used when:
Checking whether Scylla API can be used is done separately per node/snapshot_dir.
Luckily, things like pause/resume/progress does not seem like they need additional work in the scope of this issue.
Also, for now Scylla versions which are supposed to support Scylla backup/restore API are:
master
6.3
2024.3
Fixes #4143
Fixes #4138
Fixes #4141