feat(docs): restore, add 3.3 desc schema docs

scylladb · Jul 1, 2024 · 55fd257 · 55fd257
1 parent ec07f94
commit 55fd257
Show file tree

Hide file tree

Showing 9 changed files with 112 additions and 67 deletions.
diff --git a/docs/source/backup/specification.rst b/docs/source/backup/specification.rst
@@ -61,6 +61,8 @@ You may find that the directory contains manifest files with ``.tmp`` suffix.
 Those are manifests files of backups that are being uploaded to the backup location.
 They are are marked as temporary until all the backup files are fully uploaded.
 
+.. _backup-schema-spec:
+
 schema
 ......
 

diff --git a/docs/source/restore/compatibility-matrix.rst b/docs/source/restore/compatibility-matrix.rst
@@ -12,6 +12,9 @@ The following table shows which version of Scylla Manager restore task supports
    * - ScyllaDB Manager Version
      - ScyllaDB Open Source Version
      - ScyllaDB Enterprise Version
+   * - 3.3
+     - 5.4, 6.0
+     - 2023.1, 2024.1
    * - 3.2
      - 5.0, 5.1, 5.2, 5.4
      - 2022.1, 2022.2, 2023.1, 2024.1

diff --git a/docs/source/restore/index.rst b/docs/source/restore/index.rst
@@ -8,6 +8,7 @@ Restore
 
    restore-tables
    restore-schema
+   old-restore-schema
    examples
    compatibility-matrix
 
@@ -34,9 +35,6 @@ Restore task has to be one of two types:
 
   * :doc:`restore schema <restore-schema>` - restores the ScyllaDB cluster schema
 
-Each of those types has required prerequisites and follow-up actions.
-For more information, please read given restore type documentation.
-
 If both the schema and the content of the tables need to be restored, you must start with restoring the schema. Only after the schema is successfully restored can you proceed with restoring the content of the tables.
 
 Features
@@ -58,6 +56,9 @@ Restore speed and granularity
 Restore speed is controlled by two parameters: ``--parallel`` and ``--batch-size``.
 Parallel specifies how many nodes can be used in restore procedure at the same time.
 Batch size specifies how many SSTable bundles can be restored from backup location in a single job.
+Note that increasing the default batch size might significantly increase restore performance,
+as only one shard can work on restoring a single SSTable bundle.
+
 Those parameters can be set when you:
 
 * Schedule a restore with :ref:`sctool restore <sctool-restore>`

diff --git a/docs/source/restore/old-restore-schema.rst b/docs/source/restore/old-restore-schema.rst
@@ -0,0 +1,69 @@
+===============================================
+Restore schema for ScyllaDB 5.4/2024.1 or older
+===============================================
+
+.. note:: Currently, Scylla Manager supports only entire schema restoration, so ``--keyspace`` flag is not allowed.
+
+.. note:: Because of small size of schema files, resuming schema restoration always starts from scratch.
+
+.. include:: _common/restore-raft-schema-warn.rst
+
+| In order to restore ScyllaDB cluster schema use :ref:`sctool restore <sctool-restore>` with ``--restore-schema`` flag.
+| Please note that the term *schema* specifically refers to the data residing in the ``system_schema keyspace``, such as keyspace and table definitions. All other data stored in keyspaces managed by ScyllaDB, such as authentication data in the ``system_auth`` keyspace, is restored as part of the :doc:`restore tables procedure <restore-tables>`.
+| The restore schema procedure works with any cluster size, so the backed-up cluster can have a different number of nodes per data center than the restore destination cluster. However, it is important that the restore destination cluster consists of at least all of the data centers present in the backed-up cluster.
+
+Prerequisites
+=============
+
+* ScyllaDB Manager with CQL credentials to restore destination cluster.
+
+* It is strongly advised to restore schema only into an empty cluster with no schema change history of the keyspace that is going to be restored.
+   Otherwise, the restored schema might be overwritten by the already existing one and cause unexpected errors.
+
+* All nodes in restore destination cluster should be in the ``UN`` state (See `nodetool status <https://docs.scylladb.com/stable/operating-scylla/nodetool-commands/status.html>`_ for details).
+
+Procedure
+=========
+
+This section contains a description of the restore-schema procedure performed by ScyllaDB Manager.
+
+Because of being unable to alter schema tables ``tombstone_gc`` option, restore procedure "simulates ad-hoc repair"
+by duplicating data from **each backed-up node into each node** in restore destination cluster.
+However, the small size of schema files makes this overhead negligible.
+
+    * Validate that all nodes are in the ``UN`` state
+    * For each backup location:
+
+      * Find all ScyllaDB *nodes* with location access and use them for restoring schema from this location
+      * List backup manifests for specified snapshot tag
+    * For each manifest:
+
+        * Filter relevant tables from the manifest
+        * For each table:
+
+          * For each *node* (in ``--parallel``):
+
+            * Download all SSTables
+    * For all nodes in restore destination cluster:
+
+        * `nodetool refresh <https://docs.scylladb.com/stable/operating-scylla/nodetool-commands/refresh.html#nodetool-refresh>`_ on all downloaded schema tables (full parallel)
+
+Follow-up action
+================
+
+After successful restore it is important to perform necessary follow-up action. In case of restoring schema,
+you should make a `rolling restart <https://docs.scylladb.com/stable/operating-scylla/procedures/config-change/rolling-restart.html>`_ of an entire cluster.
+Without the restart, the restored schema might not be visible, and querying it can return various errors.
+
+.. _restore-schema-workaround:
+
+Restoring schema into a cluster with ScyllaDB **5.4.X** or **2024.1.X** with **consistent_cluster_management**
+==============================================================================================================
+
+Restoring schema when using ScyllaDB **5.4.X** or **2024.1.X** with ``consistent_cluster_management: true`` in ``scylla.yaml``
+is not supported. In such case, you should perform the following workaround:
+
+    * Create a fresh cluster with ``consistent_cluster_management: false`` configured in ``scylla.yaml`` and a desired ScyllaDB version.
+    * Restore schema via :ref:`sctool restore <sctool-restore>` with ``--restore-schema`` flag.
+    * Perform `rolling restart <https://docs.scylladb.com/stable/operating-scylla/procedures/config-change/rolling-restart.html>`_ of an entire cluster.
+    * Follow the steps of the `Enable Raft procedure <https://opensource.docs.scylladb.com/stable/architecture/raft.html#enabling-raft>`_.
diff --git a/docs/source/restore/restore-schema.rst b/docs/source/restore/restore-schema.rst
@@ -1,69 +1,37 @@
-==============
-Restore schema
-==============
+===============================================
+Restore schema for ScyllaDB 6.0/2024.2 or newer
+===============================================
 
-.. note:: Currently, Scylla Manager supports only entire schema restoration, so ``--keyspace`` flag is not allowed.
+.. note:: Currently, ScyllaDB Manager supports only entire schema restoration, so ``--keyspace`` flag is not allowed.
 
-.. note:: Because of small size of schema files, resuming schema restoration always starts from scratch.
+.. note:: Currently, restoring schema containing `alternator tables <https://opensource.docs.scylladb.com/stable/using-scylla/alternator/>`_ is not supported.
 
-.. include:: _common/restore-raft-schema-warn.rst
-
-| In order to restore Scylla cluster schema use :ref:`sctool restore <sctool-restore>` with ``--restore-schema`` flag.
-| Please note that the term *schema* specifically refers to the data residing in the ``system_schema keyspace``, such as keyspace and table definitions. All other data stored in keyspaces managed by ScyllaDB, such as authentication data in the ``system_auth`` keyspace, is restored as part of the :doc:`restore tables procedure <restore-tables>`.
-| The restore schema procedure works with any cluster size, so the backed-up cluster can have a different number of nodes per data center than the restore destination cluster. However, it is important that the restore destination cluster consists of at least all of the data centers present in the backed-up cluster.
+| In order to restore ScyllaDB cluster schema use :ref:`sctool restore <sctool-restore>` with ``--restore-schema`` flag.
+| Please note that the term *schema* specifically refers to the data residing in the ``system_schema keyspace``, such as keyspace and table definitions. All other data stored in keyspaces managed by ScyllaDB is restored as part of the :doc:`restore tables procedure <restore-tables>`.
+| The restore schema procedure works with any cluster size, so the backed-up cluster can have a different number of nodes than the restore destination cluster.
 
 Prerequisites
 =============
 
-* Scylla Manager with CQL credentials to restore destination cluster.
-
-* It is strongly advised to restore schema only into an empty cluster with no schema change history of the keyspace that is going to be restored.
-   Otherwise, the restored schema might be overwritten by the already existing one and cause unexpected errors.
+* ScyllaDB Manager requires CQL credentials with
 
-* All nodes in restore destination cluster should be in the ``UN`` state (See `nodetool status <https://docs.scylladb.com/stable/operating-scylla/nodetool-commands/status.html>`_ for details).
+    * `permission to create <https://opensource.docs.scylladb.com/stable/operating-scylla/security/authorization.html#permissions>`_ restored keyspaces.
 
-Follow-up action
-================
+* No overlapping schema in restore destination cluster (see the procedure below for more details)
 
-After successful restore it is important to perform necessary follow-up action. In case of restoring schema,
-you should make a `rolling restart <https://docs.scylladb.com/stable/operating-scylla/procedures/config-change/rolling-restart.html>`_ of an entire cluster.
-Without the restart, the restored schema might not be visible, and querying it can return various errors.
+* Restore destination cluster must consist of the same DCs as the backed up cluster (see the procedure below for more details)
 
 Procedure
 =========
 
-This section contains a description of the restore-schema procedure performed by ScyllaDB Manager.
-
-Because of being unable to alter schema tables ``tombstone_gc`` option, restore procedure "simulates ad-hoc repair"
-by duplicating data from **each backed-up node into each node** in restore destination cluster.
-Fortunately, the small size of schema files makes this overhead negligible.
-
-    * Validate that all nodes are in the ``UN`` state
-    * For each backup location:
-
-      * Find all Scylla *nodes* with location access and use them for restoring schema from this location
-      * List backup manifests for specified snapshot tag
-    * For each manifest:
-
-        * Filter relevant tables from the manifest
-        * For each table:
-
-          * For each *node* (in ``--parallel``):
-
-            * Download all SSTables
-    * For all nodes in restore destination cluster:
-
-        * `nodetool refresh <https://docs.scylladb.com/stable/operating-scylla/nodetool-commands/refresh.html#nodetool-refresh>`_ on all downloaded schema tables (full parallel)
-
-.. _restore-schema-workaround:
+ScyllaDB Manager simply applies the backed up output of ``DESCRIBE SCHEMA WITH INTERNALS`` via CQL.
 
-Restoring schema into a cluster with ScyllaDB **5.4.X** or **2024.1.X** with **consistent_cluster_management**
-==============================================================================================================
+For this reason, restoring schema will fail when any restored CQL object (keyspace/table/type/...) is already present in the cluster.
+In such case, you should first drop the overlapping schema and then proceed with restore.
 
-Restoring schema when using ScyllaDB **5.4.X** or **2024.1.X** with ``consistent_cluster_management: true`` in ``scylla.yaml``
-is not supported. In such case, you should perform the following workaround:
+Another problem could be that restored keyspace was defined with ``NetworkTopologyStrategy`` containing DCs that are not present in the restore destination cluster.
+This would result in CQL error when trying to create such keyspace.
+In such case, you should manually fetch the backed-up schema file (see :ref:`backup schema specification <backup-schema-spec>`),
+change problematic DC names, and apply all CQL statements.
 
-    * Create a fresh cluster with ``consistent_cluster_management: false`` configured in ``scylla.yaml`` and a desired ScyllaDB version.
-    * Restore schema via :ref:`sctool restore <sctool-restore>` with ``--restore-schema`` flag.
-    * Perform `rolling restart <https://docs.scylladb.com/stable/operating-scylla/procedures/config-change/rolling-restart.html>`_ of an entire cluster.
-    * Follow the steps of the `Enable Raft procedure <https://opensource.docs.scylladb.com/stable/architecture/raft.html#enabling-raft>`_.
+In case of an error, Manager will try to rollback all applied schema changes.
diff --git a/docs/source/restore/restore-tables.rst b/docs/source/restore/restore-tables.rst
@@ -4,6 +4,8 @@ Restore tables
 
 .. note:: Currently, Scylla Manager does not support restoring content of `CDC log tables <https://docs.scylladb.com/stable/using-scylla/cdc/cdc-log-table.html>`_.
 
+.. warning:: Restoring data related to *authentication* (``system_auth``) and *service levels* (``system_distributed.service_levels``) is not supported in ScyllaDB 6.0.
+
 | To restore the content of the tables (rows), use the :ref:`sctool restore <sctool-restore>` command with the ``--restore-tables`` flag.
 | The restore tables procedure works with any cluster topologies, so the backed-up cluster can have a different number of nodes or data centers than the restore destination cluster.
 

diff --git a/docs/source/sctool/partials/sctool_restore.yaml b/docs/source/sctool/partials/sctool_restore.yaml
@@ -11,7 +11,7 @@ options:
       default_value: "2"
       usage: |
         Number of SSTables per shard to process in one request by one node.
-        Increasing batch size increases job granularity.
+        Increasing the default batch size might significantly increase restore performance, as only one shard can work on restoring a single SSTable bundle.
     - name: cluster
       shorthand: c
       usage: |
@@ -96,10 +96,9 @@ options:
       default_value: "false"
       usage: |
         Specifies restore type (alternative to '--restore-tables' flag).
-        Restore will recreate schema by targeting 'system_schema.*' tables only ('--keyspace' flag mustn't be set).
+        Restore will recreate schema by applying the backed up output of DESCRIBE SCHEMA WITH INTERNALS via CQL.
         It requires that restored keyspaces aren't present in the cluster.
-        For the schema to be visible, restart of the entire cluster is required after completion.
-        See ScyllaDB cluster restart: https://docs.scylladb.com/stable/operating-scylla/procedures/config-change/rolling-restart.html for details.
+        For the full list of prerequisites, please see https://manager.docs.scylladb.com/stable/restore/restore-schema.html.
     - name: restore-tables
       default_value: "false"
       usage: |
@@ -108,6 +107,7 @@ options:
         It requires that correct schema of restored tables is already present in the cluster (schema can be restored using '--restore-schema' flag).
         Moreover, in order to prevent situation in which current tables' contents overlaps restored data,
         tables should be truncated before initializing restore.
+        For the full list of prerequisites, please see https://manager.docs.scylladb.com/stable/restore/restore-tables.html.
     - name: retry-wait
       default_value: 10m
       usage: |

diff --git a/docs/source/sctool/partials/sctool_restore_update.yaml b/docs/source/sctool/partials/sctool_restore_update.yaml
@@ -9,7 +9,7 @@ options:
       default_value: "2"
       usage: |
         Number of SSTables per shard to process in one request by one node.
-        Increasing batch size increases job granularity.
+        Increasing the default batch size might significantly increase restore performance, as only one shard can work on restoring a single SSTable bundle.
     - name: cluster
       shorthand: c
       usage: |
@@ -94,10 +94,9 @@ options:
       default_value: "false"
       usage: |
         Specifies restore type (alternative to '--restore-tables' flag).
-        Restore will recreate schema by targeting 'system_schema.*' tables only ('--keyspace' flag mustn't be set).
+        Restore will recreate schema by applying the backed up output of DESCRIBE SCHEMA WITH INTERNALS via CQL.
         It requires that restored keyspaces aren't present in the cluster.
-        For the schema to be visible, restart of the entire cluster is required after completion.
-        See ScyllaDB cluster restart: https://docs.scylladb.com/stable/operating-scylla/procedures/config-change/rolling-restart.html for details.
+        For the full list of prerequisites, please see https://manager.docs.scylladb.com/stable/restore/restore-schema.html.
     - name: restore-tables
       default_value: "false"
       usage: |
@@ -106,6 +105,7 @@ options:
         It requires that correct schema of restored tables is already present in the cluster (schema can be restored using '--restore-schema' flag).
         Moreover, in order to prevent situation in which current tables' contents overlaps restored data,
         tables should be truncated before initializing restore.
+        For the full list of prerequisites, please see https://manager.docs.scylladb.com/stable/restore/restore-tables.html.
     - name: retry-wait
       default_value: 10m
       usage: |

diff --git a/pkg/command/restore/res.yaml b/pkg/command/restore/res.yaml
@@ -26,25 +26,25 @@ snapshot-tag: |
 
 batch-size: |
   Number of SSTables per shard to process in one request by one node.
-  Increasing batch size increases job granularity.
+  Increasing the default batch size might significantly increase restore performance, as only one shard can work on restoring a single SSTable bundle.
 
 parallel: |
   The maximum number of Scylla restore jobs that can be run at the same time (on different SSTables).
   Each node can take part in at most one restore at any given moment.
 
 restore-schema: |
   Specifies restore type (alternative to '--restore-tables' flag).
-  Restore will recreate schema by targeting 'system_schema.*' tables only ('--keyspace' flag mustn't be set).
+  Restore will recreate schema by applying the backed up output of DESCRIBE SCHEMA WITH INTERNALS via CQL.
   It requires that restored keyspaces aren't present in the cluster.
-  For the schema to be visible, restart of the entire cluster is required after completion.
-  See ScyllaDB cluster restart: https://docs.scylladb.com/stable/operating-scylla/procedures/config-change/rolling-restart.html for details.
+  For the full list of prerequisites, please see https://manager.docs.scylladb.com/stable/restore/restore-schema.html.
 
 restore-tables: |
   Specifies restore type (alternative to '--restore-schema' flag).
   Restore will recreate contents of tables specified by '--keyspace' flag.
   It requires that correct schema of restored tables is already present in the cluster (schema can be restored using '--restore-schema' flag).
   Moreover, in order to prevent situation in which current tables' contents overlaps restored data,
   tables should be truncated before initializing restore.
+  For the full list of prerequisites, please see https://manager.docs.scylladb.com/stable/restore/restore-tables.html.
 
 dry-run: |
   Validates and displays restore information without actually running the restore.
-Original file line number
+Diff line change
@@ Expand Up @@
     Those are manifests files of backups that are being uploaded to the backup location.
     They are are marked as temporary until all the backup files are fully uploaded.
+    .. _backup-schema-spec:
     schema
     ......
@@ Expand Down @@