diff --git a/en/TOC.md b/en/TOC.md index 55c6954b77..c8363bf1d4 100644 --- a/en/TOC.md +++ b/en/TOC.md @@ -76,12 +76,12 @@ - Persistent Volumes - [Back Up Data](backup-to-pv-using-br.md) - [Restore Data](restore-from-pv-using-br.md) - - Snapshot Backup and Restore - - [Architecture](volume-snapshot-backup-restore.md) - - [Back Up Data Using EBS Snapshots](backup-to-aws-s3-by-snapshot.md) - - [Restore Data from EBS Snapshots](restore-from-aws-s3-by-snapshot.md) - - [Backup and Restore Performance](backup-restore-snapshot-perf.md) - - [FAQs](backup-restore-faq.md) + - Snapshot Backup and Restore across Multiple Kubernetes + - [BR Federation Architecture](br-federation-architecture.md) + - [Deploy BR Federation](deploy-br-federation.md) + - [Back Up Data Using EBS Snapshots](backup-by-ebs-snapshot-across-multiple-kubernetes.md) + - [Restore Data from EBS Snapshots](restore-from-ebs-snapshot-across-multiple-kubernetes.md) + - [FAQs](backup-restore-by-ebs-snapshot-faq.md) - Maintain - [Restart a TiDB Cluster](restart-a-tidb-cluster.md) - [Destroy a TiDB Cluster](destroy-a-tidb-cluster.md) diff --git a/en/backup-by-ebs-snapshot-across-multiple-kubernetes.md b/en/backup-by-ebs-snapshot-across-multiple-kubernetes.md new file mode 100644 index 0000000000..271ba94db0 --- /dev/null +++ b/en/backup-by-ebs-snapshot-across-multiple-kubernetes.md @@ -0,0 +1,408 @@ +--- +title: Back Up a TiDB Cluster across Multiple Kubernetes Using EBS Volume Snapshots +summary: Learn how to back up TiDB cluster data across multiple Kubernetes to S3 based on EBS volume snapshots using BR Federation. +--- + +# Back Up a TiDB Cluster across Multiple Kubernetes Using EBS Volume Snapshots + +This document describes how to back up the data of a TiDB cluster deployed across multiple AWS Kubernetes clusters to AWS storage using EBS volume snapshots. + +The backup method described in this document is implemented based on CustomResourceDefinition (CRD) in [BR Federation](br-federation-architecture.md#br-federation-architecture-and-processes) and TiDB Operator. [BR](https://docs.pingcap.com/tidb/stable/backup-and-restore-overview) (Backup & Restore) is a command-line tool for distributed backup and recovery of the TiDB cluster data. For the underlying implementation, BR gets the backup data of the TiDB cluster, and then sends the data to the AWS storage. + +> **Note** +> +> Before you back up data, make sure that you have [deployed BR Federation](deploy-br-federation.md). + +## Usage scenarios + +If you have the following requirements when backing up TiDB cluster data, you can use TiDB Operator to back up the data using volume snapshots and metadata to Amazon S3: + +- Minimize the impact of backup, such as keeping the impact on QPS and transaction latency less than 5%, and not utilizing cluster CPU and memory. +- Back up and restore data in a short period of time. For example, completing a backup within 1 hour and restore it within 2 hours. + +If you have any other requirements, refer to [Backup and Restore Overview](backup-restore-overview.md) and select an appropriate backup method. + +## Prerequisites + +Storage blocks on volumes that were created from snapshots must be initialized (pulled down from Amazon S3 and written to the volume) before you can access the block. This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed. Volume performance is achieved after all blocks have been downloaded and written to the volume. + +According to AWS documentation, the EBS volume restored from snapshots might have high latency before it is initialized. This can impact the performance of a restored TiDB cluster. See details in [Create a volume from a snapshot](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-volume.html#ebs-create-volume-from-snapshot). + +To initialize the restored volume more efficiently, it is recommended to **separate WAL and raft log into a dedicated small volume apart from TiKV data**. By fully initializing the volume of WAL and raft log separately, we can enhance write performance for a restored TiDB cluster. + +## Limitations + +- Snapshot backup is applicable to TiDB Operator v1.5.2 or later versions, and TiDB v6.5.8 or later versions. +- For TiKV configuration, do not set `resolved-ts.enable` to `false`, and do not set `raftstore.report-min-resolved-ts-interval` to `"0s"`. Otherwise, it can lead to backup failure. +- For PD configuration, do not set `pd-server.min-resolved-ts-persistence-interval` to `"0s"`. Otherwise, it can lead to backup failure. +- To use this backup method, the TiDB cluster must be deployed on AWS EC2 and use AWS EBS volumes. +- This backup method is currently not supported for TiFlash, TiCDC, DM, and TiDB Binlog nodes. + +> **Note:** +> +> - To perform volume snapshot restore, ensure that the TiKV configuration during restore is consistent with the configuration used during backup. +> - To check consistency, download the `backupmeta` file from the backup file stored in Amazon S3, and check the `kubernetes.crd_tidb_cluster.spec` field. +> - If this field is inconsistent, you can modify the TiKV configuration by referring to [Configure a TiDB Cluster on Kubernetes](configure-a-tidb-cluster.md). +> - If [Encryption at Rest](https://docs.pingcap.com/tidb/stable/encryption-at-rest) is enabled for TiKV KMS, ensure that the master key is enabled for AWS KMS during restore. + +## Ad-hoc backup + +You can either fully or incrementally back up snapshots based on AWS EBS volumes. The initial backup of a node is full backup, while subsequent backups are incremental backup. + +Snapshot backup is defined in a customized `VolumeBackup` custom resource (CR) object. The BR Federation completes the backup task according to the specifications in this object. + +### Step 1. Set up the environment for EBS volume snapshot backup in every data plane + +**You must execute the following steps in every data plane**. + +1. Download the [`backup-rbac.yaml`](https://github.com/pingcap/tidb-operator/blob/master/manifests/backup/backup-rbac.yaml) file to the backup server. + +2. If you have deployed the TiDB cluster in `${namespace}`, create the RBAC-related resources required for the backup in this namespace by running the following command: + + ```shell + kubectl apply -f backup-rbac.yaml -n ${namespace} + ``` + +3. Grant permissions to access remote storage. + + To back up cluster data and save snapshot metadata to Amazon S3, you need to grant permissions to remote storage. Refer to [AWS account authorization](grant-permissions-to-remote-storage.md#aws-account-permissions) for the three available methods. + +### Step 2. Back up data to S3 storage + +**You must execute the following steps in the control plane**. + +Depending on the authorization method you choose in the previous step for granting remote storage access, you can back up data by EBS snapshots using any of the following methods accordingly: + + +
+ +If you grant permissions by accessKey and secretKey, you can create the `VolumeBackup` CR as follows: + +```shell +kubectl apply -f backup-fed.yaml +``` + +The `backup-fed.yaml` file has the following content: + +```yaml +--- +apiVersion: federation.pingcap.com/v1alpha1 +kind: VolumeBackup +metadata: + name: ${backup-name} +spec: + clusters: + - k8sClusterName: ${k8s-name1} + tcName: ${tc-name1} + tcNamespace: ${tc-namespace1} + - k8sClusterName: ${k8s-name2} + tcName: ${tc-name2} + tcNamespace: ${tc-namespace2} + - ... # other clusters + template: + br: + sendCredToTikv: true + s3: + provider: aws + secretName: s3-secret + region: ${region-name} + bucket: ${bucket-name} + prefix: ${backup-path} + toolImage: ${br-image} + cleanPolicy: Delete + calcSizeLevel: {snapshot-size-calculation-level} +``` + +
+ +
+ +If you grant permissions by associating Pod with IAM, you can create the `VolumeBackup` CR as follows: + +```shell +kubectl apply -f backup-fed.yaml +``` + +The `backup-fed.yaml` file has the following content: + +```yaml +--- +apiVersion: federation.pingcap.com/v1alpha1 +kind: VolumeBackup +metadata: + name: ${backup-name} + annotations: + iam.amazonaws.com/role: arn:aws:iam::123456789012:role/role-name +spec: + clusters: + - k8sClusterName: ${k8s-name1} + tcName: ${tc-name1} + tcNamespace: ${tc-namespace1} + - k8sClusterName: ${k8s-name2} + tcName: ${tc-name2} + tcNamespace: ${tc-namespace2} + - ... # other clusters + template: + br: + sendCredToTikv: false + s3: + provider: aws + region: ${region-name} + bucket: ${bucket-name} + prefix: ${backup-path} + toolImage: ${br-image} + cleanPolicy: Delete + calcSizeLevel: {snapshot-size-calculation-level} +``` + +
+ +
+ +If you grant permissions by associating ServiceAccount with IAM, you can create the `VolumeBackup` CR as follows: + +```shell +kubectl apply -f backup-fed.yaml +``` + +The `backup-fed.yaml` file has the following content: + +```yaml +--- +apiVersion: federation.pingcap.com/v1alpha1 +kind: VolumeBackup +metadata: + name: ${backup-name} +spec: + clusters: + - k8sClusterName: ${k8s-name1} + tcName: ${tc-name1} + tcNamespace: ${tc-namespace1} + - k8sClusterName: ${k8s-name2} + tcName: ${tc-name2} + tcNamespace: ${tc-namespace2} + - ... # other clusters + template: + br: + sendCredToTikv: false + s3: + provider: aws + region: ${region-name} + bucket: ${bucket-name} + prefix: ${backup-path} + toolImage: ${br-image} + serviceAccount: tidb-backup-manager + cleanPolicy: Delete + calcSizeLevel: {snapshot-size-calculation-level} +``` + +
+
+ +> **Note:** +> +> The value of `spec.clusters.k8sClusterName` field in `VolumeBackup` CR must be the same as the **context name** of the kubeconfig used by the br-federation-manager. + +### Step 3. View the backup status + +After creating the `VolumeBackup` CR, the BR Federation automatically starts the backup process in each data plane. + +To check the volume backup status, use the following command: + +```shell +kubectl get vbk -n ${namespace} -o wide +``` + +Once the volume backup is complete, you can get the information of all the data planes in the `status.backups` field. This information can be used for volume restore. + +To obtain the information, use the following command: + +```shell +kubectl get vbk ${backup-name} -n ${namespace} -o yaml +``` + +The information is as follows: + +```yaml +status: + backups: + - backupName: fed-{backup-name}-{k8s-name1} + backupPath: s3://{bucket-name}/{backup-path}-{k8s-name1} + commitTs: "ts1" + k8sClusterName: {k8s-name1} + tcName: {tc-name1} + tcNamespace: {tc-namespace1} + - backupName: fed-{backup-name}-{k8s-name2} + backupPath: s3://{bucket-name}/{backup-path}-{k8s-name2} + commitTs: "ts2" + k8sClusterName: {k8s-name2} + tcName: {tc-name2} + tcNamespace: {tc-namespace2} + - ... # other backups +``` + +### Delete the `VolumeBackup` CR + +If you set `spec.template.cleanPolicy` to `Delete`, when you delete the `VolumeBackup` CR, the BR Federation will clean up the backup file and the volume snapshots on AWS. + +To delete the `VolumeBackup` CR, run the following commands: + +```shell +kubectl delete backup ${backup-name} -n ${namespace} +``` + +## Scheduled volume backup + +To ensure regular backups of the TiDB cluster and prevent an excessive number of backup items, you can set a backup policy and retention policy. + +This can be done by creating a `VolumeBackupSchedule` CR object that describes the scheduled snapshot backup. Each backup time point triggers a volume backup. The underlying implementation is the ad-hoc volume backup. + +### Perform a scheduled volume backup + +**You must execute the following steps in the control plane**. + +Depending on the authorization method you choose in the previous step for granting remote storage access, perform a scheduled volume backup by doing one of the following: + + +
+ +If you grant permissions by accessKey and secretKey, Create the `VolumeBackupSchedule` CR, and back up cluster data as described below: + +```shell +kubectl apply -f volume-backup-scheduler.yaml +``` + +The content of `volume-backup-scheduler.yaml` is as follows: + +```yaml +--- +apiVersion: federation.pingcap.com/v1alpha1 +kind: VolumeBackupSchedule +metadata: + name: {scheduler-name} + namespace: {namespace-name} +spec: + #maxBackups: {number} + #pause: {bool} + maxReservedTime: {duration} + schedule: {cron-expression} + backupTemplate: + clusters: + - k8sClusterName: {k8s-name1} + tcName: {tc-name1} + tcNamespace: {tc-namespace1} + - k8sClusterName: {k8s-name2} + tcName: {tc-name2} + tcNamespace: {tc-namespace2} + - ... # other clusters + template: + br: + sendCredToTikv: true + s3: + provider: aws + secretName: s3-secret + region: {region-name} + bucket: {bucket-name} + prefix: {backup-path} + toolImage: {br-image} + cleanPolicy: Delete + calcSizeLevel: {snapshot-size-calculation-level} +``` + +
+ +
+ +If you grant permissions by associating Pod with IAM, Create the `VolumeBackupSchedule` CR, and back up cluster data as described below: + +```shell +kubectl apply -f volume-backup-scheduler.yaml +``` + +The content of `volume-backup-scheduler.yaml` is as follows: + +```yaml +--- +apiVersion: federation.pingcap.com/v1alpha1 +kind: VolumeBackupSchedule +metadata: + name: {scheduler-name} + namespace: {namespace-name} + annotations: + iam.amazonaws.com/role: arn:aws:iam::123456789012:role/role-name +spec: + #maxBackups: {number} + #pause: {bool} + maxReservedTime: {duration} + schedule: {cron-expression} + backupTemplate: + clusters: + - k8sClusterName: {k8s-name1} + tcName: {tc-name1} + tcNamespace: {tc-namespace1} + - k8sClusterName: {k8s-name2} + tcName: {tc-name2} + tcNamespace: {tc-namespace2} + - ... # other clusters + template: + br: + sendCredToTikv: false + s3: + provider: aws + region: {region-name} + bucket: {bucket-name} + prefix: {backup-path} + toolImage: {br-image} + cleanPolicy: Delete + calcSizeLevel: {snapshot-size-calculation-level} +``` + +
+ +
+ +If you grant permissions by associating ServiceAccount with IAM, Create the `VolumeBackupSchedule` CR, and back up cluster data as described below: + +```shell +kubectl apply -f volume-backup-scheduler.yaml +``` + +The content of `volume-backup-scheduler.yaml` is as follows: + +```yaml +--- +apiVersion: federation.pingcap.com/v1alpha1 +kind: VolumeBackupSchedule +metadata: + name: {scheduler-name} + namespace: {namespace-name} +spec: + #maxBackups: {number} + #pause: {bool} + maxReservedTime: {duration} + schedule: {cron-expression} + backupTemplate: + clusters: + - k8sClusterName: {k8s-name1} + tcName: {tc-name1} + tcNamespace: {tc-namespace1} + - k8sClusterName: {k8s-name2} + tcName: {tc-name2} + tcNamespace: {tc-namespace2} + - ... # other clusters + template: + br: + sendCredToTikv: false + s3: + provider: aws + region: {region-name} + bucket: {bucket-name} + prefix: {backup-path} + serviceAccount: tidb-backup-manager + toolImage: {br-image} + cleanPolicy: Delete + calcSizeLevel: {snapshot-size-calculation-level} +``` + +
+
diff --git a/en/backup-restore-by-ebs-snapshot-faq.md b/en/backup-restore-by-ebs-snapshot-faq.md new file mode 100644 index 0000000000..24873e096c --- /dev/null +++ b/en/backup-restore-by-ebs-snapshot-faq.md @@ -0,0 +1,45 @@ +--- +title: FAQs on EBS Snapshot Backup and Restore across Multiple Kubernetes +summary: Learn about the common questions and solutions for EBS snapshot backup and restore across multiple Kubernetes. +--- + +# FAQs on EBS Snapshot Backup and Restore across Multiple Kubernetes + +This document addresses common questions and solutions related to EBS snapshot backup and restore across multiple Kubernetes environments. + +## New tags on snapshots and restored volumes + +**Symptom:** Some tags are automatically added to generated snapshots and restored EBS volumes + +**Explanation:** The new tags are added for traceability. Snapshots inherit all tags from the individual source EBS volumes, while restored EBS volumes inherit tags from the source snapshots but prefix keys with `snapshot\`. Additionally, new tags such as ``, `` are added to restored EBS volumes. + +## Backup Initialize Failed + +**Symptom:** You get the error that contains `GC safepoint 443455494791364608 exceed TS 0` when the backup is initializing. + +**Solution:** This issue might occur if you have disabled the feature of "resolved ts" in TiKV or PD. Check the configuration of TiKV and PD: + +- For TiKV, confirm if you set `resolved-ts.enable = false` or `raftstore.report-min-resolved-ts-interval = "0s"`. If so, remove these configurations. +- For PD, confirm if you set `pd-server.min-resolved-ts-persistence-interval = "0s"`. If so, remove this configuration. + +## Backup failed due to execution twice + +**Issue:** [#5143](https://github.com/pingcap/tidb-operator/issues/5143) + +**Symptom:** You get the error that contains `backup meta file exists`, and the backup pod is scheduled twice. + +**Solution:** This issue might occur if the first backup pod is evicted by Kubernetes due to node resource pressure. You can configure `PriorityClass` and `ResourceRequirements` to reduce the possibility of eviction. For more details, refer to the [comment of issue](https://github.com/pingcap/tidb-operator/issues/5143#issuecomment-1654916830). + +## Save time for backup by controlling snapshot size calculation level + +**Symptom:** Scheduled backup can't be completed in the expected window due to the cost of snapshot size calculation. + +**Solution:** By default, both full size and incremental size are calculated by calling the AWS service, which might take several minutes. You can set `spec.template.calcSizeLevel` to `full` to skip incremental size calculation, set it to `incremental` to skip full size calculation, and set it to `none` to skip both calculations. + +## How to configure the TTL for the backup init job + +The backup init job will handle backup preparations, including pausing GC, certain PD schedulers, and suspending Lightning. By default, a TTL of 10 minutes is associated with the init job in case it gets stuck. You can change the TTL by setting the `spec.template.volumeBackupInitJobMaxActiveSeconds` attribute of spec of volumebackup. + +## How to flow control to snapshots deletion + +EBS snapshot backup GC is performed on one volumebackup at a time. For larger clusters with EBS snapshot backups, there might still be a significant number of snapshots for a single volume backup. Therefore, flow control is necessary for snapshot deletion. You can manage the expected ratio in a single data plane by setting the `spec.template.snapshotsDeleteRatio` parameter of the backup schedule CRD. The default value is 1.0, which ensures no more than one snapshot deletion per second. diff --git a/en/backup-restore-faq.md b/en/backup-restore-faq.md index e8303e02dd..a14b979b59 100644 --- a/en/backup-restore-faq.md +++ b/en/backup-restore-faq.md @@ -3,6 +3,10 @@ title: FAQs on EBS Snapshot Backup and Restore summary: Learn about the common questions and solutions for EBS snapshot backup and restore. --- +> **Warning:** +> +> This document is deprecated. + # FAQs on EBS Snapshot Backup and Restore This document describes the common questions that occur during EBS snapshot backup and restore and the solutions. diff --git a/en/backup-restore-overview.md b/en/backup-restore-overview.md index 0805a462f0..4a68da17b5 100644 --- a/en/backup-restore-overview.md +++ b/en/backup-restore-overview.md @@ -29,7 +29,7 @@ Refer to the following documents for more information: - [Back up Data to GCS Using BR](backup-to-gcs-using-br.md) - [Back up Data to Azure Blob Storage Using BR](backup-to-azblob-using-br.md) - [Back up Data to PV Using BR](backup-to-pv-using-br.md) -- [Back up Data Using EBS Snapshots](backup-to-aws-s3-by-snapshot.md) +- [Back up Data Using EBS Snapshots across Multiple Kubernetes](backup-by-ebs-snapshot-across-multiple-kubernetes.md) If you have the following backup needs, you can use Dumpling to make a backup of the TiDB cluster data: @@ -50,7 +50,7 @@ To recover the SST files exported by BR to a TiDB cluster, use BR. Refer to the - [Restore Data from GCS Using BR](restore-from-gcs-using-br.md) - [Restore Data from Azure Blob Storage Using BR](restore-from-azblob-using-br.md) - [Restore Data from PV Using BR](restore-from-pv-using-br.md) -- [Restore Data Using EBS Snapshots](restore-from-aws-s3-by-snapshot.md) +- [Restore Data Using EBS Snapshots across Multiple Kubernetes](restore-from-ebs-snapshot-across-multiple-kubernetes.md) To restore data from SQL or CSV files exported by Dumpling or other compatible data sources to a TiDB cluster, use TiDB Lightning. Refer to the following documents for more information: diff --git a/en/backup-restore-snapshot-perf.md b/en/backup-restore-snapshot-perf.md index f996621240..650a1c7c73 100644 --- a/en/backup-restore-snapshot-perf.md +++ b/en/backup-restore-snapshot-perf.md @@ -3,6 +3,10 @@ title: Performance of EBS Snapshot Backup and Restore summary: Learn about the performance of EBS snapshot backup and restore. --- +> **Warning:** +> +> This document is deprecated. + # Performance of EBS Snapshot Backup and Restore This document describes the performance of EBS snapshot backup and restore, the factors that affect performance, and the performance test results. The performance metrics are based on the AWS region `us-west-2`. diff --git a/en/backup-to-aws-s3-by-snapshot.md b/en/backup-to-aws-s3-by-snapshot.md index 8d660d9ffa..d160e5cb12 100644 --- a/en/backup-to-aws-s3-by-snapshot.md +++ b/en/backup-to-aws-s3-by-snapshot.md @@ -3,6 +3,10 @@ title: Back Up a TiDB Cluster Using EBS Volume Snapshots summary: Learn how to back up TiDB cluster data to S3 based on EBS volume snapshots using TiDB Operator. --- +> **Warning:** +> +> This document is deprecated. If you need to back up your cluster data using EBS snapshots, refer to [Back Up a TiDB Cluster across Multiple Kubernetes Using EBS Volume Snapshots](backup-by-ebs-snapshot-across-multiple-kubernetes.md). + # Back Up a TiDB Cluster Using EBS Volume Snapshots This document describes how to back up a TiDB cluster deployed on AWS Elastic Kubernetes Service (EKS) to S3. diff --git a/en/br-federation-architecture.md b/en/br-federation-architecture.md new file mode 100644 index 0000000000..2f1dd3110e --- /dev/null +++ b/en/br-federation-architecture.md @@ -0,0 +1,60 @@ +--- +title: BR Federation Architecture and Processes +summary: Learn the architecture of backup and restore based on EBS volume snapshots in TiDB cluster deployed across multiple Kubernetes. +--- + +# BR Federation Architecture and Processes + +BR Federation is a system designed to [back up and restore TiDB clusters deployed across multiple Kubernetes using EBS snapshots](deploy-tidb-cluster-across-multiple-kubernetes.md). + +Normally, TiDB Operator can only access the Kubernetes cluster where it is deployed. This means a TiDB Operator can only back up TiKV volumes' snapshots within its own Kubernetes cluster. However, to perform EBS snapshot backup and restore across multiple Kubernetes clusters, a coordinator role is required. This is where the BR Federation comes in. + +This document outlines the architecture of the BR Federation and the processes involved in backup and restoration. + +## BR Federation architecture + +BR Federation operates as the control plane, interacting with the data plane, which includes each Kubernetes cluster where TiDB components are deployed. The interaction is facilitated through the Kubernetes API Server. + +BR Federation coordinates `Backup` and `Restore` Custom Resources (CRs) in the data plane to accomplish backup and restoration across multiple Kubernetes clusters. + +![BR Federation architecture](/media/br-federation-architecture.png) + +## Backup process + +### Backup process in data plane + +The backup process in the data plane consists of three phases: + +1. **Phase One:** TiDB Operator schedules a backup pod to request PD to pause region scheduling and Garbage Collection (GC). As each TiKV instance might take snapshots at different times, pausing scheduling and GC can avoid data inconsistencies between TiKV instances during snapshot taking. Since the TiDB components are interconnected across multiple Kubernetes clusters, executing this operation in one Kubernetes cluster affects the entire TiDB cluster. + +2. **Phase Two:** TiDB Operator collects meta information such as `TidbCluster` CR and EBS volumes, and then schedules another backup pod to request AWS API to create EBS snapshots. This phase must be executed in each Kubernetes cluster. + +3. **Phase Three:** After EBS snapshots are completed, TiDB Operator deletes the first backup pod to resume region scheduling and GC for the TiDB cluster. This operation is required only in the Kubernetes cluster where Phase One was executed. + +![backup process in data plane](/media/volume-backup-process-data-plane.png) + +### Backup orchestration process + +The orchestration process of `Backup` from the control plane to the data plane is as follows: + +![backup orchestration process](/media/volume-backup-process-across-multiple-kubernetes-overall.png) + +## Restore process + +### Restore process in data plane + +The restore process in the data plane consists of three phases: + +1. **Phase One:** TiDB Operator schedules a restore pod to request the AWS API to restore the EBS volumes using EBS snapshots based on the backup information. The volumes are then mounted onto the TiKV nodes, and TiKV instances are started in recovery mode. This phase must be executed in each Kubernetes cluster. + +2. **Phase Two:** TiDB Operator schedules another restore pod to restore all raft logs and KV data in TiKV instances to a consistent state, and then instructs TiKV instances to exit recovery mode. As TiKV instances are interconnected across multiple Kubernetes clusters, this operation can restore all TiKV data and only needs to be executed in one Kubernetes cluster. + +3. **Phase Three:** TiDB Operator restarts all TiKV instances to run in normal mode, and start TiDB finally. This phase must be executed in each Kubernetes cluster. + +![restore process in data plane](/media/volume-restore-process-data-plane.png) + +### Restore orchestration process + +The orchestration process of `Restore` from the control plane to the data plane is as follows: + +![restore orchestration process](/media/volume-restore-process-across-multiple-kubernetes-overall.png) diff --git a/en/deploy-br-federation.md b/en/deploy-br-federation.md new file mode 100644 index 0000000000..89475ada2d --- /dev/null +++ b/en/deploy-br-federation.md @@ -0,0 +1,241 @@ +--- +title: Deploy BR Federation on Kubernetes +summary: Learn how to deploy BR Federation on Kubernetes. +--- + +# Deploy BR Federation on Kubernetes + +This document describes how to deploy [BR Federation](br-federation-architecture.md#br-federation-architecture-and-processes) across multiple Kubernetes clusters. + +## Prerequisites + +Before deploy BR Federation on Kubernetes cluster, make sure you have met the following prerequisites: + +* Kubernetes version must be >= v1.12. +* You must have multiple Kubernetes clusters. +* You have deployed TiDB Operator for all the Kubernetes clusters that serve as data planes. + +## Step 1: Generate a kubeconfig file in data planes + +The BR Federation manages Kubernetes clusters of data planes by accessing their API servers. To authenticate and authorize itself in the API servers, BR Federation requires a kubeconfig file. The users or service accounts in the kubeconfig file need to have at least all the permissions of **backups.pingcap.com** and **restores.pingcap.com** CRD. + +You can get the kubeconfig file from the Kubernetes cluster administrator. However, if you have permission to access all the data planes, you can generate the kubeconfig file on your own. + +### Step 1.1: Create RBAC resources in data planes + +To enable the BR Federation to manipulate Backup and Restore CR, you need to create the following resources in every data plane. + +```yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: br-federation-member + namespace: tidb-admin +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: br-federation-manager:br-federation-member +rules: +- apiGroups: + - pingcap.com + resources: + - backups + - restores + verbs: + - '*' +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: br-federation-manager:br-federation-member +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: br-federation-manager:br-federation-member +subjects: +- kind: ServiceAccount + name: br-federation-member + namespace: tidb-admin +``` + +For Kubernetes >= v1.24, to let external applications access the Kubernetes API server, you need to manually create a service account secret as follows: + +```yaml +apiVersion: v1 +kind: Secret +type: kubernetes.io/service-account-token +metadata: + name: br-federation-member-secret + namespace: tidb-admin + annotations: + kubernetes.io/service-account.name: "br-federation-member" +``` + +### Step 1.2: Generate kubeconfig files + +Execute the following script for every data plane. + +```shell +# for Kubernetes < 1.24 +export TOKEN_SECRET_NAME=$(kubectl -n tidb-admin get serviceaccount br-federation-member -o=jsonpath='{.secrets[0].name}') +# for Kubernetes >= 1.24, the service account secret should be created manually as above, so you should use its name as value of TOKEN_SECRET_NAME +# export TOKEN_SECRET_NAME=br-federation-member-secret +export USER_TOKEN_VALUE=$(kubectl -n tidb-admin get secret/${TOKEN_SECRET_NAME} -o=go-template='{{.data.token}}' | base64 --decode) +export CURRENT_CONTEXT=$(kubectl config current-context) +export CURRENT_CLUSTER=$(kubectl config view --raw -o=go-template='{{range .contexts}}{{if eq .name "'''${CURRENT_CONTEXT}'''"}}{{ index .context "cluster" }}{{end}}{{end}}') +export CLUSTER_CA=$(kubectl config view --raw -o=go-template='{{range .clusters}}{{if eq .name "'''${CURRENT_CLUSTER}'''"}}"{{with index .cluster "certificate-authority-data" }}{{.}}{{end}}"{{ end }}{{ end }}') +export CLUSTER_SERVER=$(kubectl config view --raw -o=go-template='{{range .clusters}}{{if eq .name "'''${CURRENT_CLUSTER}'''"}}{{ .cluster.server }}{{end}}{{ end }}') +# you should modify this value in different data plane +export DATA_PLANE_SYMBOL="a" + +cat << EOF > {k8s-name}-kubeconfig +apiVersion: v1 +kind: Config +current-context: ${DATA_PLANE_SYMBOL} +contexts: +- name: ${DATA_PLANE_SYMBOL} + context: + cluster: ${CURRENT_CLUSTER} + user: br-federation-member-${DATA_PLANE_SYMBOL} + namespace: kube-system +clusters: +- name: ${CURRENT_CLUSTER} + cluster: + certificate-authority-data: ${CLUSTER_CA} + server: ${CLUSTER_SERVER} +users: +- name: br-federation-member-${DATA_PLANE_SYMBOL} + user: + token: ${USER_TOKEN_VALUE} +EOF +``` + +The environment variable `$DATA_PLANE_SYMBOL` represents the name of the data plane cluster. Make sure that you provide a brief and unique name. In the preceding script, you use this variable as the context name for kubeconfig. The context name will be used as `k8sClusterName` in both the `VolumeBackup` and `VolumeRestore` CR. + +### Step 1.3: Merge multiple kubeconfig files into one + +After following the previous steps to generate kubeconfig, you now have multiple kubeconfig files. You need to merge them into a single kubeconfig file. + +Assume that you have 3 kubeconfig files with file paths: `kubeconfig-path1`, `kubeconfig-path2`, `kubeconfig-path3`. To merge these files into one kubeconfig file with file path `data-planes-kubeconfig`, execute the following command: + +```shell +KUBECONFIG=${kubeconfig-path1}:${kubeconfig-path2}:${kubeconfig-path3} kubectl config view --flatten > ${data-planes-kubeconfig} +``` + +## Step 2: Deploy BR Federation in the control plane + +To deploy the BR Federation, you need to select one Kubernetes cluster as the control plane. The following steps **must be executed on the control plane**. + +### Step 2.1: Create CRD + +The BR Federation uses [Custom Resource Definition (CRD)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions) to extend Kubernetes. Before using the BR Federation, you must create the CRD in your Kubernetes cluster. After using the BR Federation Manager, you only need to perform the operation once. + +```shell +kubectl create -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.5.2/manifests/federation-crd.yaml +``` + +### Step 2.2: Prepare the kubeconfig secret + +Now that you already have a kubeconfig file of data planes, you need to encode the kubeconfig file into a secret. Take the following steps: + +1. Encode the kubeconfig file: + + ```shell + base64 -i ${kubeconfig-path} + ``` + +2. Store the output from the previous step in a secret object. + + Note that the name of the secret and the data key of the kubeconfig field **must** match the following example: + + ```yaml + apiVersion: v1 + kind: Secret + metadata: + name: br-federation-kubeconfig + type: Opaque + data: + kubeconfig: ${encoded-kubeconfig} + ``` + +### Step 2.3: Install BR Federation + +This section describes how to install the BR Federation using [Helm 3](https://helm.sh/docs/intro/install/). + +- If you prefer to use the default configuration, follow the **Quick deployment** steps. +- If you prefer to use a custom configuration, follow the **Custom deployment** steps. + + +
+ +1. To create resources related to the BR Federation, create a namespace: + + ```shell + kubectl create ns br-fed-admin + ``` + +2. In the specified namespace, create a secret that contains all the encoded kubeconfig files: + + ```shell + kubectl create -f ${secret-path} -n br-fed-admin + ``` + +3. Add the PingCAP repository: + + ```shell + helm repo add pingcap https://charts.pingcap.org/ + ``` + +4. Install the BR Federation: + + ```shell + helm install --namespace br-fed-admin br-federation pingcap/br-federation --version v1.5.2 + ``` + +
+
+ +1. To create resources related to the BR Federation, create a namespace: + + ```shell + kubectl create ns br-fed-admin + ``` + +2. In the specified namespace, create a secret that contains all the encoded kubeconfig files: + + ```shell + kubectl create -f ${secret-path} -n br-fed-admin + ``` + +3. Add the PingCAP repository: + + ```shell + helm repo add pingcap https://charts.pingcap.org/ + ``` + +4. Get the `values.yaml` file of the desired `br-federation` chart for deployment. + + ```shell + mkdir -p ${HOME}/br-federation && \ + helm inspect values pingcap/br-federation --version=v1.5.2 > ${HOME}/br-federation/values.yaml + ``` + +5. Configure the BR Federation by modifying fields such as `image`, `limits`, `requests`, and `replicas` according to your needs. + +6. Deploy the BR Federation. + + ```shell + helm install --namespace br-fed-admin br-federation pingcap/br-federation --version v1.5.2 -f ${HOME}/br-federation/values.yaml && \ + kubectl get po -n br-fed-admin -l app.kubernetes.io/instance=br-federation + ``` + +
+
+ +## What's next + +After deploying BR Federation, you can now perform the following tasks: + +- [Back Up a TiDB Cluster across Multiple Kubernetes Using EBS Volume Snapshots](backup-by-ebs-snapshot-across-multiple-kubernetes.md) +- [Restore a TiDB Cluster across Multiple Kubernetes from EBS Volume Snapshots](restore-from-ebs-snapshot-across-multiple-kubernetes.md) diff --git a/en/restore-from-aws-s3-by-snapshot.md b/en/restore-from-aws-s3-by-snapshot.md index 5054a93698..ace781b58f 100644 --- a/en/restore-from-aws-s3-by-snapshot.md +++ b/en/restore-from-aws-s3-by-snapshot.md @@ -3,6 +3,10 @@ title: Restore a TiDB Cluster from EBS Volume Snapshots summary: Learn how to restore backup metadata and EBS volume snapshots from S3 storage to a TiDB cluster. --- +> **Warning:** +> +> This document is deprecated. If you need to restore your cluster data from EBS snapshots, refer to [Restore a TiDB Cluster across Multiple Kubernetes from EBS Volume Snapshots](restore-from-ebs-snapshot-across-multiple-kubernetes.md). + # Restore a TiDB Cluster from EBS Volume Snapshots This document describes how to restore backup data in AWS EBS snapshots from S3 storage to a TiDB cluster. diff --git a/en/restore-from-ebs-snapshot-across-multiple-kubernetes.md b/en/restore-from-ebs-snapshot-across-multiple-kubernetes.md new file mode 100644 index 0000000000..7241868f54 --- /dev/null +++ b/en/restore-from-ebs-snapshot-across-multiple-kubernetes.md @@ -0,0 +1,243 @@ +--- +title: Restore a TiDB Cluster across Multiple Kubernetes from EBS Volume Snapshots +summary: Learn how to restore a TiDB cluster across multiple Kubernetes from EBS Volume Snapshots. +--- + +# Restore a TiDB Cluster across Multiple Kubernetes from EBS Volume Snapshots + +This document describes how to restore backup data in AWS EBS snapshots to a TiDB cluster across multiple Kubernetes clusters. + +The restore method described in this document is implemented based on CustomResourceDefinition (CRD) in [BR Federation](br-federation-architecture.md#br-federation-architecture-and-processes) and TiDB Operator. [BR](https://docs.pingcap.com/tidb/stable/backup-and-restore-overview) (Backup & Restore) is a command-line tool for distributed backup and recovery of the TiDB cluster data. For the underlying implementation, BR restores the data. + +> **Note** +> +> Before you restore data, make sure that you have [deployed BR Federation](deploy-br-federation.md). + +## Limitations + +- Snapshot restore is applicable to TiDB Operator v1.5.2 or later versions and TiDB v6.5.8 or later versions. +- You can use snapshot restore only to restore data to a cluster with the same number of TiKV nodes and volumes configuration. That is, the number of TiKV nodes and volume configurations of TiKV nodes are identical between the restore cluster and backup cluster. +- Snapshot restore is currently not supported for TiFlash, TiCDC, DM, and TiDB Binlog nodes. + +## Prerequisites + +Before restoring a TiDB cluster across multiple Kubernetes clusters from EBS volume snapshots, you need to complete the following preparations. + +- Complete the volume backup + + For detailed steps, refer to [Back Up a TiDB Cluster across Multiple Kubernetes Using EBS Volume Snapshots](backup-by-ebs-snapshot-across-multiple-kubernetes.md). + +- Prepare the restore cluster + + - Deploy a TiDB cluster across multiple Kubernetes clusters that you want to restore data to. For detailed steps, refer to [Deploy a TiDB Cluster across Multiple Kubernetes Clusters](deploy-tidb-cluster-across-multiple-kubernetes.md). + - When deploying the TiDB cluster, add the `recoveryMode: true` field to the spec of `TidbCluster`. + +> **Note:** +> +> The EBS volume restored from snapshots might have high latency before it is initialized. This can impact the performance of a restored TiDB cluster. See details in [Create a volume from a snapshot](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-volume.html#ebs-create-volume-from-snapshot). +> +> It is recommended that you configure `spec.template.warmup: sync` to initialize TiKV volumes automatically during the restoration process. +> You can specify the warmup strategy for TiKV data volumes. Set `spec.template.warmupStrategy: fio` to use `fio` linux command to warmup data volumes, set to `spec.template.warmupStrategy: fsr` to enable FSR for data volumes before start TiKV, and set to `spec.template.warmupStrategy: hybrid` to use file scanner to warmup data volumes. `hybrid` is the default option. + +If you choose `fsr` as the warmup strategy, you need to grant permissions of `ec2: EnableFastSnapshotRestores`, `ec2: DisableFastSnapshotRestores` and `ec2: DescribeFastSnapshotRestores` and `coudwatch:GetMetricStatistics` to the IAM role. And you also need to increase the EBS service quota of `Fast snapshot restore` to at least the number of TiKV nodes. + +## Restore process + +### Step 1. Set up the environment for EBS volume snapshot restore in every data plane + +**You must execute the following steps in every data plane**. + +1. Download the [backup-rbac.yaml](https://github.com/pingcap/tidb-operator/blob/master/manifests/backup/backup-rbac.yaml) file to the restore server. + +2. Create the RBAC-related resources required for the restore by running the following command. Note that the RBAC-related resources must be put in the same `${namespace}` as the TiDB cluster. + + ```shell + kubectl apply -f backup-rbac.yaml -n ${namespace} + ``` + +3. Grant permissions to access remote storage. + + To restore data from EBS snapshots, you need to grant permissions to remote storage. Three ways are available. Refer to [AWS account authorization](grant-permissions-to-remote-storage.md#aws-account-permissions) for the three available methods. + +### Step 2. Restore data to the TiDB cluster + +**You must execute the following steps in the control plane**. + +Depending on the authorization method you choose in the previous step for granting remote storage access, you can restore data to TiDB using any of the following methods accordingly: + +> **Note:** +> +> Snapshot restore creates volumes with the default configuration (3000 IOPS/125 MB/s) of GP3. To perform restore using other configurations, you can specify the volume type or configuration, such as `--volume-type=gp3`, `--volume-iops=7000`, or `--volume-throughput=400`, and they are shown in the following examples. + + +
+ +If you grant permissions by accessKey and secretKey, you can create the `VolumeRestore` CR as follows: + +```shell +kubectl apply -f restore-fed.yaml +``` + +The `restore-fed.yaml` file has the following content: + +```yaml +--- +apiVersion: federation.pingcap.com/v1alpha1 +kind: VolumeRestore +metadata: + name: ${restore-name} +spec: + clusters: + - k8sClusterName: ${k8s-name1} + tcName: ${tc-name1} + tcNamespace: ${tc-namespace1} + backup: + s3: + provider: aws + secretName: s3-secret + region: ${region-name} + bucket: ${bucket-name} + prefix: ${backup-path1} + - k8sClusterName: ${k8s-name2} + tcName: ${tc-name2} + tcNamespace: ${tc-namespace2} + backup: + s3: + provider: aws + secretName: s3-secret + region: ${region-name} + bucket: ${bucket-name} + prefix: ${backup-path2} + - ... # other clusters + template: + br: + sendCredToTikv: true + options: + - --volume-type=gp3 + - --volume-iops=7000 + - --volume-throughput=400 + toolImage: ${br-image} + warmup: sync + warmupImage: ${wamrup-image} + warmupStrategy: fio +``` + +
+ +
+ +If you grant permissions by associating Pod with IAM, you can create the `VolumeRestore` CR as follows: + +```shell +kubectl apply -f restore-fed.yaml +``` + +The `restore-fed.yaml` file has the following content: + +```yaml +--- +apiVersion: federation.pingcap.com/v1alpha1 +kind: VolumeRestore +metadata: + name: ${restore-name} + annotations: + iam.amazonaws.com/role: arn:aws:iam::123456789012:role/role-name +spec: + clusters: + - k8sClusterName: ${k8s-name1} + tcName: ${tc-name1} + tcNamespace: ${tc-namespace1} + backup: + s3: + provider: aws + region: ${region-name} + bucket: ${bucket-name} + prefix: ${backup-path1} + - k8sClusterName: ${k8s-name2} + tcName: ${tc-name2} + tcNamespace: ${tc-namespace2} + backup: + s3: + provider: aws + region: ${region-name} + bucket: ${bucket-name} + prefix: ${backup-path2} + - ... # other clusters + template: + br: + sendCredToTikv: false + options: + - --volume-type=gp3 + - --volume-iops=7000 + - --volume-throughput=400 + toolImage: ${br-image} + warmup: sync + warmupImage: ${wamrup-image} + warmupStrategy: fsr +``` + +
+ +
+ +If you grant permissions by associating ServiceAccount with IAM, you can create the `VolumeRestore` CR as follows: + +```shell +kubectl apply -f restore-fed.yaml +``` + +The `restore-fed.yaml` file has the following content: + +```yaml +--- +apiVersion: federation.pingcap.com/v1alpha1 +kind: VolumeRestore +metadata: + name: ${restore-name} +spec: + clusters: + - k8sClusterName: ${k8s-name1} + tcName: ${tc-name1} + tcNamespace: ${tc-namespace1} + backup: + s3: + provider: aws + region: ${region-name} + bucket: ${bucket-name} + prefix: ${backup-path1} + - k8sClusterName: ${k8s-name2} + tcName: ${tc-name2} + tcNamespace: ${tc-namespace2} + backup: + s3: + provider: aws + region: ${region-name} + bucket: ${bucket-name} + prefix: ${backup-path2} + - ... # other clusters + template: + br: + sendCredToTikv: false + options: + - --volume-type=gp3 + - --volume-iops=7000 + - --volume-throughput=400 + toolImage: ${br-image} + serviceAccount: tidb-backup-manager + warmup: sync + warmupImage: ${warmup-image} + warmupStrategy: hybrid +``` + +
+
+ +### Step 3. View the restore status + +After creating the `VolumeRestore` CR, the restore process automatically start. + +To check the restore status, use the following command: + +```shell +kubectl get vrt -n ${namespace} -o wide +``` diff --git a/en/volume-snapshot-backup-restore.md b/en/volume-snapshot-backup-restore.md index 5217113c97..0ce27b518a 100644 --- a/en/volume-snapshot-backup-restore.md +++ b/en/volume-snapshot-backup-restore.md @@ -3,6 +3,10 @@ title: Architecture of Backup and Restore Based on EBS Volume Snapshots summary: Learn the architecture of backup and restore based on EBS volume snapshots in TiDB. --- +> **Warning:** +> +> This document is deprecated. Refer to [BR Federation](br-federation-architecture.md). + # Architecture of Backup and Restore Based on EBS Volume Snapshots Backup and restore based on EBS volume snapshots is provided in TiDB Operator. This document describes the architecture and process of this feature by exemplifying backup and restore using TiDB Operator. diff --git a/media/br-federation-architecture.png b/media/br-federation-architecture.png new file mode 100644 index 0000000000..111c3a3c28 Binary files /dev/null and b/media/br-federation-architecture.png differ diff --git a/media/volume-backup-process-across-multiple-kubernetes-overall.png b/media/volume-backup-process-across-multiple-kubernetes-overall.png new file mode 100644 index 0000000000..21e6742d7f Binary files /dev/null and b/media/volume-backup-process-across-multiple-kubernetes-overall.png differ diff --git a/media/volume-backup-process-data-plane.png b/media/volume-backup-process-data-plane.png new file mode 100644 index 0000000000..63ae66ab71 Binary files /dev/null and b/media/volume-backup-process-data-plane.png differ diff --git a/media/volume-restore-process-across-multiple-kubernetes-overall.png b/media/volume-restore-process-across-multiple-kubernetes-overall.png new file mode 100644 index 0000000000..f4f1f47903 Binary files /dev/null and b/media/volume-restore-process-across-multiple-kubernetes-overall.png differ diff --git a/media/volume-restore-process-data-plane.png b/media/volume-restore-process-data-plane.png new file mode 100644 index 0000000000..35b3aa192c Binary files /dev/null and b/media/volume-restore-process-data-plane.png differ