PoC to test the data sync from TiDB to SqlServer thrugh Aurora. So far, there is no approach to sync data from TiDB to SQLServer directly. To achieve the near-realtime data sync from TiDB to SQL Server, add SQLServer and DMS.
file:./png/TiDBCloud2MS.png
$ aws tidb2ms template tidbcloud2ms-prod > /tmp/tidbcloud2ms-prod.yaml global: user: tidb <- User to login the server to run aws script db_host: localhost workstation: cidr: 172.81.0.0/16 instance_type: m5.4xlarge keyname: key-name <- Public key name to login the workstation server keyfile: /home/tidb/.ssh/key-name.pem <- Private key file usded to login the workstation server username: admin <- User name to login imageid: ami-0c7ea5497c02abcaf aurora: cidr: 172.83.0.0/16 instance_type: db.m5.8xlarge sqlserver: cidr: 172.85.0.0/16 instance_type: m5.2xlarge keyname: jay-west imageid: ami-00dcbd817ce3d7a77 dms: cidr: 172.86.0.0/16 instance_type: m5.2xlarge
It will take about half hour to complete all the resources preparation.
Resource Name | # Of resources |
---|---|
VPC | 4 |
Subnets | 8 |
Route table | 4 |
Security Group | 4 |
DB Cluster Parameter | 1 |
DB Parameter Parameter | 1 |
DB Subnet group | 1 |
Aurora cluster | 1 |
Aurora Instance | 1 |
DMS subnet group | 1 |
DMS endpoints | 2 |
DMS Replication instance | 1 |
DMS Replication task | 1 |
SQLServer | 1 |
Routes | 13 |
Transit Gateway | 1 |
TransitGateway Attachment | 4 |
Internet Gateway | 1 |
$ aws tidb2ms deploy tidbcloud2ms /tmp/tidbcloud2ms-prod.yaml
- docker
$cd /tmp $wget https://download.docker.com/linux/static/stable/x86_64/docker-20.10.8.tgz $tar xvf docker-20.10.8.tgz $sudo cp docker/* /usr/bin/ $sudo groupadd -g 10001 docker $more /etc/docker/daemon.json { "experimental": false } $$ rpm -qa | grep iptables iptables-1.8.4-17.el8.x86_64 iptables-libs-1.8.4-17.el8.x86_64 $more /etc/systemd/system/containerd.service [Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target local-fs.target [Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/bin/containerd Type=notify Delegate=yes KillMode=process Restart=always RestartSec=5 LimitNPROC=infinity LimitCORE=infinity LimitNOFILE=1048576 TasksMax=infinity OOMScoreAdjust=-999 [Install] WantedBy=multi-user.target $more /etc/systemd/system/docker.socket [Unit] Description=Docker Socket for the API [Socket] ListenStream=/var/run/docker.sock SocketMode=0660 SocketUser=root SocketGroup=docker [Install] WantedBy=sockets.target $more /etc/systemd/system/docker.service [Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com After=network-online.target firewalld.service containerd.service Wants=network-online.target Requires=docker.socket containerd.service [Service] Type=notify ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock ExecReload=/bin/kill -s HUP $MAINPID TimeoutSec=0 RestartSec=2 Restart=always StartLimitBurst=3 StartLimitInterval=60s LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity TasksMax=infinity Delegate=yes KillMode=process OOMScoreAdjust=-500 [Install] WantedBy=multi-user.target
- docker image
- debezium/connect
$docker pull debezium/connect
- debezium/kafka
$docker pull debezium/kafka
- debezium/zookeeper
$docker pull debezium/zookeeper
- mysql:5.7
$docker pull mysql:5.7
- mcr.microsoft.com/mssql/server
$docker pull mcr.microsoft.com/mssql/server
- jdbc-sink-connect
$wget https://d1i4a15mxbxib1.cloudfront.net/api/plugins/confluentinc/kafka-connect-jdbc/versions/10.2.2/confluentinc-kafka-connect-jdbc-10.2.2.zip $sudo unzip confluentinc-kafka-connect-jdbc-10.2.2.zip $mkdir -p external-connector/confluentinc-kafka-connect-jdbc-10.2.2 $cp confluentinc-kafka-connect-jdbc-10.2.2/lib/*.jar external-connector/confluentinc-kafka-connect-jdbc-10.2.2/
- mysql driver
$wget https://download.jar-download.com/cache_jars/mysql/mysql-connector-java/8.0.26/jar_files.zip $unzip jar_files.zip -d mysql_jdbc $cp mysql_jdbc/*.jar external-connector/confluentinc-kafka-connect-jdbc-10.2.2/
refer to https://github.com/debezium/debezium-examples/blob/master/tutorial/docker-compose-sqlserver.yaml
- preparation
$ sudo docker network create --driver=bridge cdcnetwork
- mssqlserver
- reference https://docs.microsoft.com/en-us/sql/relational-databases/databases/create-a-database?view=sql-server-ver15 https://www.bookstack.cn/read/tidb-in-action/session4-chapter5-from-sqlserver-to-tidb.md#ciariy
- Preparation
$docker run --name ms2ti-ms --network cdcnetwork -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=Passwd@1234" -e "MSSQL_PID=Standard" -e "MSSQL_AGENT_ENABLED=true" -p 1433:1433 -d mcr.microsoft.com/mssql/server $docker exec -it ms2ti-ms /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P Passwd@1234 1>use master; 2> CREATE DATABASE cdcdb ON 3> (NAME = cdcdb_dat, FILENAME = '/tmp/MSSQL/DATA/cdcdbdat.mdf', SIZE = 1, MAXSIZE = 10, FILEGROWTH = 5) 4> LOG ON 5> (NAME = cdcdb_log, FILENAME = '/tmp/MSSQL/DATA/cdcdblog.ldf', SIZE = 5MB, MAXSIZE = 25MB, FILEGROWTH = 5MB) ; 6> go 1> use cdcdb 2> go Changed database context to 'cdcdb'. 1> sys.sp_cdc_enable_db; 2> go 1> create table cdc_src_table(col01 int primary key, col02 varchar(32) , col03 date); 2> go 1> select * from cdc_src_table; 2> go col01 col02 col03 ----------- -------------------------------- ---------------- (0 rows affected) 1> EXEC sys.sp_cdc_enable_table @source_schema= N'dbo',@source_name= N'cdc_src_table',@role_name= NULL; 2> go Job 'cdc.cdcdb_capture' started successfully. Job 'cdc.cdcdb_cleanup' started successfully. 1> SELECT name, is_cdc_enabled from sys.databases where is_cdc_enabled =1; 2> go name is_cdc_enabled -------------------------------------------------------------------------------------------------------------------------------- -------------- cdcdb 1 (1 rows affected) 1> insert into cdc_src_table values(1, 'cdc test text', '2020-01-01'); 2> go (1 rows affected) 1> select * from cdc_src_table; 2> go col01 col02 col03 ----------- -------------------------------- ---------------- 1 cdc test text 2020-01-01 (1 rows affected) 1> select * from cdc.dbo_cdc_src_table_ct; 2> go __$start_lsn __$end_lsn __$seqval __$operation __$update_mask col01 col02 col03 __$command_id ---------------------- ---------------------- ---------------------- ------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ----------- -------------------------------- ---------------- ------------- 0x0000002600000318001C NULL 0x0000002600000318001B 2 0x07 1 cdc test text 2020-01-01 1 (1 rows affected)
- zookeeper
$docker run --name ms2ti-zk --network cdcnetwork -d -p 2181:2181 -p 2888:2888 -p 3888:3888 debezium/zookeeper
- kafka
$docker run --name ms2ti-kf --network cdcnetwork -d -p 9092:9092 -e ZOOKEEPER_CONNECT=ms2ti-zk:2181 debezium/kafka
- connect
$docker run --name ms2ti-connect --network cdcnetwork -d -p 8083:8083 -e BOOTSTRAP_SERVERS=ms2ti-kf:9092 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses -v $(pwd)/external-connector/confluentinc-kafka-connect-jdbc-10.2.2:/kafka/connect/confluentinc-kafka-connect-jdbc debezium/connect
- mysql
$docker run -d --name mysql --network cdcnetwork -p 3306:3306 -e MYSQL_ROOT_PASSWORD=passwd1234 -e MYSQL_DATABASE=cdctest mysql:5.7 $docker exec -it mysql mysql -h mysql -u root mysql -p mysql>create database ms2t mysql>create table cdc_src_table(col01 int primary key, col02 varchar(32), col03 date );
- source connector
- sink connector
- Prepare one application to keep inserting data into mssql
$more source.config { "name": "ms2ti-connector", "config": { "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector", "database.hostname": "ms2ti-ms", "database.port": "1433", "database.user": "sa", "database.password": "Passwd@1234", "database.dbname": "cdcdb", "database.server.name": "cdcms2ti", "time.precision.mode": "connect", "table.include.list": "dbo.cdc_src_table", "database.history.kafka.bootstrap.servers": "ms2ti-kf:9092", "database.history.kafka.topic": "dbhistory.cdcms2ti" } } $curl -X POST -H "Content-Type: application/json" http://localhost:8083/connectors -d @source.config $curl http://localhost:8083/connectors | jq [ "ms2ti-connector" ] $curl http://localhost:8083/connectors/ms2ti-connector/status | jq { "name": "ms2ti-connector", "connector": { "state": "RUNNING", "worker_id": "172.18.0.5:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "172.18.0.5:8083" } ], "type": "source" } $docker exec -it ms2ti-kf bash $/kafka/bin/kafka-topics.sh --list --zookeeper ms2ti-zk:2181 __consumer_offsets cdcms2ti cdcms2ti.dbo.cdc_src_table dbhistory.cdcms2ti my_connect_configs my_connect_offsets my_connect_statuses
- Prepare one application to count source and destination DB’s count
$more sinke.config { "name": "jdbc-sink", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", "tasks.max": "1", "topics.regex": "cdcms2ti.dbo.(.*)", "connection.url": "jdbc:mysql://mysql:3306/ms2ti?user=root&password=passwd1234", "transforms": "dropPrefix, unwrap", "transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter", "transforms.dropPrefix.regex": "cdcms2ti.dbo.(.*)", "transforms.dropPrefix.replacement": "$1", "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState", "transforms.unwrap.drop.tombstones": "false", "auto.create": "false", "insert.mode": "upsert", "delete.enabled": "true", "pk.fields": "col01", "pk.mode": "record_key" } } $curl -X POST -H "Content-Type: application/json" http://localhost:8083/connectors -d @sink.config
File name | Done |
---|---|
vpc.go | 12/18 |
network | |
route_table | |
security_group | |
ec2 | |
db_subnet | |
rds | |
db_parameter | |
db_cluster_parameter | |
dms_subnet | |
dms_endpoints | |
dms_replication_instance | |
dms_replication_task | |
transit_gateway | |
transit_gateway_attachment | |
vpc_peering |
aws ec2 describe-images –filters “Name=name,Values=debian-10-amd64*” “Name=architecture,Values=x86_64” “Name=hypervisor,Values=xen” “Name=image-type,Values=machine” “Name=is-public,Values=true” “Name=block-device-mapping.volume-type,Values=gp2” –query ‘reverse(sort_by(Images, &CreationDate))[:1]’