Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Scylla API for backup #4169

Open
wants to merge 9 commits into
base: ml/scylla-api
Choose a base branch
from

Conversation

Michal-Leszczynski
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski commented Dec 16, 2024

This PR starts using Scylla backup API in SM backup task!
It is mostly complete and can be tested, but there are 3 issues that were discovered during development:

In terms of the general overview of this PR - the main objective was to fix replace the /agent/rclone/sync/movedir Rclone API with the /storage_service/backup Scylla API - nothing more.
Scylla API can be used when:

  • node exposes Scylla backup API
  • s3 is the used provider
  • backup won't create versioned files

Checking whether Scylla API can be used is done separately per node/snapshot_dir.

Luckily, things like pause/resume/progress does not seem like they need additional work in the scope of this issue.

Also, for now Scylla versions which are supposed to support Scylla backup/restore API are:

  • master
  • 6.3
  • 2024.3

Fixes #4143
Fixes #4138
Fixes #4141

@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/backup-scylla-api branch 3 times, most recently from abbfaed to b51c8b6 Compare December 17, 2024 11:12
For Scylla to access object storage, it needs to be configured
in the 'object_storage.yaml' config file.
A separate column for Scylla task ID is needed because:
- it has a different type from agent job ID
- it make it clear which API was used
Those methods consist of both:
- direct Scylla backup API call
- helper Scylla Task Manager API calls
When working with Rclone, SM specifies just the provider name,
and Rclone (with agent config) resolves it internally to the correct endpoint.
This made it so user didn't need to specify the exact endpoint when running SM backup/restore tasks.

When working with Scylla, SM needs to specify resolved host name on its own.
This should be the same name as specified in 'object_storage.yaml'
(See https://github.com/scylladb/scylladb/blob/92db2eca0b8ab0a4fa2571666a7fe2d2b07c697b/docs/dev/object_storage.md?plain=1#L29-L39).

In order to maximize compatibility and UX, we still want it to be possible
to specify just the provider name when running backup/restore.
In such case, SM sends provider name as the "endpoint" query param,
which is resolved by agent to proper host name when forwarding request to Scylla.
Different "endpoint" query params are not resolved.

Note that resolving "endpoint" query param in the proxy is just for the UX,
so it might not work correctly in all the cases.
In order to ensure correctness, "endpoint" should be specified directly by SM user
so that no resolving is needed.
Scylla backup API can be used when:
- node exposes Scylla backup API
- s3 is the used provider
- backup won't create versioned files
This commit adds code for using Scylla backup API.
Luckily for us, handling pause/resume and progress
is analogous to the Rclone API handling.

Fixes #4143
Fixes #4138
Fixes #4141
Some tests used interceptor for given paths
in order to wait/block/check some API calls.
Those interceptors were updated to also look
for Scylla backup API paths.
Using Scylla backup API does not result in changes
to Rclone transfers, rate limiting or cpu pinning,
so it shouldn't be checked as a part of the restore test.
This is a simple test for checking whether the correct API
is used during the backup.
@Michal-Leszczynski Michal-Leszczynski marked this pull request as ready for review December 18, 2024 15:31
@Michal-Leszczynski Michal-Leszczynski changed the base branch from master to ml/scylla-api December 18, 2024 15:32
@Michal-Leszczynski
Copy link
Collaborator Author

@karol-kokoszka @VAveryanov8 so the idea is that the ml/scylla-api will be the branch for the scylla api milestone.
Please take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant