From 5abdaf6b55afe3732338fc32f2d34f614a7d7de3 Mon Sep 17 00:00:00 2001 From: Ti Chi Robot Date: Wed, 15 May 2024 16:44:42 +0800 Subject: [PATCH] ticdc: enhance storage sink uri config (#17490) (#17504) --- ticdc/ticdc-sink-to-cloud-storage.md | 59 ++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/ticdc/ticdc-sink-to-cloud-storage.md b/ticdc/ticdc-sink-to-cloud-storage.md index 15ca22c11cf9a..658211877fbe4 100644 --- a/ticdc/ticdc-sink-to-cloud-storage.md +++ b/ticdc/ticdc-sink-to-cloud-storage.md @@ -59,24 +59,83 @@ For `[query_parameters]` in the URI, the following parameters can be configured: ### Configure sink URI for external storage +When storing data in a cloud storage system, you need to set different authentication parameters depending on the cloud service provider. This section describes the authentication methods when using Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage, and how to configure accounts to access the corresponding storage services. + + +
+ The following is an example configuration for Amazon S3: ```shell --sink-uri="s3://bucket/prefix?protocol=canal-json" ``` +Before replicating data, you need to set appropriate access permissions for the directory in Amazon S3: + +- Minimum permissions required by TiCDC: `s3:ListBucket`, `s3:PutObject`, and `s3:GetObject`. +- If the changefeed configuration item `sink.cloud-storage-config.flush-concurrency` is greater than 1, which means parallel uploading of single files is enabled, you need to additionally add permissions related to [ListParts](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListParts.html): + - `s3:AbortMultipartUpload` + - `s3:ListMultipartUploadParts` + - `s3:ListBucketMultipartUploads` + +If you have not created a replication data storage directory, refer to [Create a bucket](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html) to create an S3 bucket in the specified region. If necessary, you can also create a folder in the bucket by referring to [Organize objects in the Amazon S3 console by using folders](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-folder.html). + +You can configure an account to access Amazon S3 in the following ways: + +- Method 1: Specify the access key + + If you specify an access key and a secret access key, authentication is performed according to them. In addition to specifying the key in the URI, the following methods are supported: + + - TiCDC reads the `$AWS_ACCESS_KEY_ID` and `$AWS_SECRET_ACCESS_KEY` environment variables. + - TiCDC reads the `$AWS_ACCESS_KEY` and `$AWS_SECRET_KEY` environment variables. + - TiCDC reads the shared credentials file in the path specified by the `$AWS_SHARED_CREDENTIALS_FILE` environment variable. + - TiCDC reads the shared credentials file in the `~/.aws/credentials` path. + +- Method 2: Access based on an IAM role + + Associate an [IAM role with configured permissions to access Amazon S3](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html) to the EC2 instance running the TiCDC server. After successful setup, TiCDC can directly access the corresponding directories in Amazon S3 without additional settings. + +
+
+ The following is an example configuration for GCS: ```shell --sink-uri="gcs://bucket/prefix?protocol=canal-json" ``` +You can configure the account used to access GCS by specifying an access key. Authentication is performed according to the specified `credentials-file`. In addition to specifying the key in the URI, the following methods are supported: + +- TiCDC reads the file in the path specified by the `$GOOGLE_APPLICATION_CREDENTIALS` environment variable. +- TiCDC reads the file `~/.config/gcloud/application_default_credentials.json`. +- TiCDC obtains credentials from the metadata server when the cluster is running in GCE or GAE. + +
+
+ The following is an example configuration for Azure Blob Storage: ```shell --sink-uri="azure://bucket/prefix?protocol=canal-json" ``` +You can configure an account to access Azure Blob Storage in the following ways: + +- Method 1: Specify a shared access signature + + If you configure `account-name` and `sas-token` in the URI, the storage account name and shared access signature token specified by this parameter are used. Because the shared access signature token has the `&` character, you need to encode it as `%26` before adding it to the URI. You can also directly encode the entire `sas-token` using percent-encoding. + +- Method 2: Specify the access key + + If you configure `account-name` and `account-key` in the URI, the storage account name and key specified by this parameter are used. In addition to specifying the key file in the URI, TiCDC can also read the key from the environment variable `$AZURE_STORAGE_KEY`. + +- Method 3: Use Azure AD to restore the backup + + Configure the environment variables `$AZURE_CLIENT_ID`, `$AZURE_TENANT_ID`, and `$AZURE_CLIENT_SECRET`. + +
+
+ > **Tip:** > > For more information about the URI parameters of Amazon S3, GCS, and Azure Blob Storage in TiCDC, see [URI Formats of External Storage Services](/external-storage-uri.md).