Skip to content

Latest commit

 

History

History
152 lines (142 loc) · 6.91 KB

tidb2kafka2redshift.msk.org

File metadata and controls

152 lines (142 loc) · 6.91 KB

Architecture

https://www.51yomo.net/static/doc/tidb2msk2redshift/tidb2msk2redshift_001.png

Environment Setup

  • workstation
  • Redshift - target database
  • AWS MSK - message queue
  • AWS MSK connect
OhMyTiUP$ more /tmp/tidb2msk2redshift.yaml
 workstation:
   cidr: 172.82.0.0/16
   imageid: ami-07d02ee1eeb0c996c                  # Workstation EC2 instance
   keyname: public key in the aws                  # Public key for workstation instance deployment
   keyfile: private key                            # Private key to access the workstation
   instance_type: c5.2xlarge
   volumeSize: 100                                 # disk size in the workstation
   username: admin
 msk:
   cidr: 172.83.0.0/16                             # The cidr for the VPC
   instance_type: kafka.t3.small                   # Default instance type
   excluded_az:                                  # The AZ to be excluded for the subnets
     - us-east-1e
 msk_connect_plugin:
   name: redshift-sink
   url: https://d1i4a15mxbxib1.cloudfront.net/api/plugins/confluentinc/kafka-connect-aws-redshift/versions/1.2.2/confluentinc-kafka-connect-aws-redshift-1.2.2.zip
   s3bucket: ossinsight-data
   s3folder: kafka/plugin
 aws_topo_configs:
   general:
     # debian os
     imageid: ami-07d02ee1eeb0c996c                # Default image id for EC2
     keyname: public key in the aws                  # Public key for workstation instance deployment
     keyfile: private key                            # Private key to access the workstation
     cidr: 182.83.0.0/16                           # The cidr for the VPC
     instance_type: m5.2xlarge                     # Default instance type
     tidb_version: v7.0.0                          # TiDB version
     excluded_az:                                  # The AZ to be excluded for the subnets
       - us-east-1e
     network_type: nat
   pd:
     instance_type: t2.small                       # Instance type for PD component
     count: 1                                      # Number of PD node to be deployed
   tidb:
     instance_type: t2.small                       # Instance type of tidb
     count: 1                                      # Number of TiDB node to be deployed
   tikv:
     -
       instance_type: t2.small                     # Instance type of TiKV
       count: 1                                    # Number of TiKV nodes to be deployed
       volumeSize: 50                              # Storage size for TiKV nodes (GB)
       volumeType: io2                             # Storage type ex. gp2(default), gp3, io2
       iops: 3000                                  # Storage IOPS(Only for gp3, io2)
   ticdc:
     instance_type: t2.small                       # Instance type of ticdc
     count: 1                                      # Number of TiDB node to be deployed
 redshift:
   cidr: 172.90.0.0/16
   instance_type: dc2.large
   admin_user: awsadmin
   password: 1234Abcd
   cluster_type: single-node

Deploy aws resource with command

OhMyTiUP$ ./bin/aws tidb2msk2redshift deploy tidb2redshift /tmp/tidb2msk2redshift.yaml
... ...
OhMyTiUP$ ./bin/aws tidb2msk2redshift list tidb2redshift
... ...
Resource Type:      EC2
Component Name  Component Cluster  State    Instance ID          Instance Type  Preivate IP    Public IP     Image ID
--------------  -----------------  -----    -----------          -------------  -----------    ---------     --------
pd              tidb               running  i-0ed20c1de7a41151e  t2.small       182.83.1.148                 ami-07d02ee1eeb0c996c
ticdc           tidb               running  i-0c8e3e99e492026e5  t2.small       182.83.4.91                  ami-07d02ee1eeb0c996c
tidb            tidb               running  i-003f9b93336df948b  t2.small       182.83.4.114                 ami-07d02ee1eeb0c996c
tikv            tidb               running  i-0de3062547a73c2e7  t2.small       182.83.1.254                 ami-07d02ee1eeb0c996c
workstation     workstation        running  i-01efe1dea86af4605  c5.2xlarge     172.82.31.111  34.239.94.86  ami-07d02ee1eeb0c996c

Resource Type:      REDSHIFT
Endpoint                                                     Port  DB Name  Master User  State      Node Type
--------                                                     ----  -------  -----------  -----      ---------
tidb2redshift.clm8j1rapquw.us-east-1.redshift.amazonaws.com  5439  dev      awsadmin     Available  dc2.large

Resource Type:      MSK
Cluster Name   State   Cluster Type  Kafka Version  Number of Broker Nodes  Endpoints
------------   -----   ------------  -------------  ----------------------  ---------
tidb2redshift  ACTIVE  PROVISIONED   3.3.2          3                       172.83.5.82 , 172.83.4.22 , 172.83.3.153

Issues

Every time the connect needs to be updated, the connector has to been recreated. It takes time to complete one minor change.

Because of this reason, it’s not convient to test all the data mapping migration from TiDB cloud to redshift database. https://repost.aws/questions/QUi4UW_uFpTdSeDYXR7MKr2w/is-there-a-way-to-update-connector-configuration-using-msk-connect-api

Issued on the tiup

pingcap/tiup#2155

Key point

Sink connector configuration

connector.class=io.confluent.connect.aws.redshift.RedshiftSinkConnector
tasks.max=1
confluent.topic.bootstrap.servers=172.83.1.57:9092,172.83.3.70:9092,172.83.2.226:9092
name=tidb2redshift
topics=test_test02
aws.redshift.domain=tidb2redshift.clm8j1rapquw.us-east-1.redshift.amazonaws.com
aws.redshift.port=5439
aws.redshift.database=test
aws.redshift.user=awsadmin
aws.redshift.password=1234Abcd
table.name.format=test02
insert.mode=insert
delete.enabled=true
pk.mode=record_key
auto.create=true

key.converter=com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter
key.converter.schemas.enable=false
key.converter.region=us-east-1
key.converter.schemaAutoRegistrationEnabled=true
key.converter.avroRecordType=GENERIC_RECORD
key.converter.registry.name=tidb2es

value.converter=com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter
value.converter.schemas.enable=false
value.converter.region=us-east-1
value.converter.schemaAutoRegistrationEnabled=true
value.converter.avroRecordType=GENERIC_RECORD
value.converter.registry.name=tidb2es

confluent bigquery sink need commercial licence

Need to add the below environments to cdc-8300.service

[Unit]
Description=cdc service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
Environment="AWS_DEFAULT_REGION=us-east-1"
Environment="AWS_ACCESS_KEY_ID=XXXXXXXXXXXX"
Environment="AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
LimitNOFILE=1000000
LimitSTACK=10485760
User=admin
ExecStart=/bin/bash -c '/home/admin/tidb/tidb-deploy/cdc-8300/scripts/run_cdc.sh'
Restart=always

RestartSec=15s

[Install]
WantedBy=multi-user.target