Skip to content

Commit

Permalink
feat(docs): document restore speed and observability
Browse files Browse the repository at this point in the history
This commit adds description of how to observe restore
speed, and how to configure restore parameters in order
to restore as fast as possible.

Fixes #3946
  • Loading branch information
Michal-Leszczynski committed Oct 29, 2024
1 parent 2c9c18d commit 2912a7d
Showing 1 changed file with 31 additions and 10 deletions.
41 changes: 31 additions & 10 deletions docs/source/restore/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,18 +48,39 @@ ScyllaDB Manager Restore command supports the following features:
* Progress tracking (:ref:`sctool progress <task-progress>`, Prometheus metrics, `Scylla Monitoring <https://monitoring.docs.scylladb.com>`_ Manager dashboard)
* :ref:`Pausing <task-stop>` and :ref:`resuming <task-start>` at any point of the process

Restore speed and granularity
=============================
Restore speed observability
===========================

| Restore speed can be checked with :ref:`sctool progress <task-progress>` command.
| It displays average per shard download and load&stream bandwidths. When used with ``--details`` flag, it also displays per host bandwidths.
Restore speed can be also observed with Prometheus metrics:

* ``scylla_manager_restore_remaining_bytes``
* ``scylla_manager_restore_downloaded_bytes``
* ``scylla_manager_restore_download_duration``
* ``scylla_manager_restore_streamed_bytes``
* ``scylla_manager_restore_stream_duration``

Restore speed control
=====================

.. _restore-speed-and-granularity:

Restore speed is controlled by two parameters: ``--parallel`` and ``--batch-size``.
Parallel specifies how many nodes can be used in restore procedure at the same time.
Batch size specifies how many SSTable bundles can be restored from backup location in a single job.
Note that increasing the default batch size might significantly increase restore performance,
as only one shard can work on restoring a single SSTable bundle.
Restore speed is controlled by many parameters (see :ref:`sctool restore <sctool-restore>` documentation for details):

* ``--batch-size``
* ``--parallel``
* ``--transfers``
* ``--unpin-agent-cpu``
* ``--allow-compaction``

| Most of those parameters have default values chosen for restoring as fast as possible.
| You should need to change them only when you want to limit the impact that the restore has on a cluster serving traffic on not currently restored tables.
Those parameters can be set when you:
| For backward compatibility reasons, the default value of ``--batch-size`` is ``2``, but it should be changed to ``0`` when you want to maximize restore speed.
| Note that with bigger batch size comes lesser granularity. This means that pausing and resuming restore would need to perform more work.
* Schedule a restore with :ref:`sctool restore <sctool-restore>`
* Update a restore specification with :ref:`sctool restore update <restore-update>`
The ``--unpin-agent-cpu`` is disabled by default, but in case you observe small download
bandwidth, you could try to :ref:`pause <task-stop>` restore task, :ref:`update <restore-update>` it with ``--unpin-agent-cpu``,
and :ref:`resume <task-start>` it.

0 comments on commit 2912a7d

Please sign in to comment.