Real time change data capture (CDC) using Apache Kafka and Aiven's JDBC Sink Connector for Apache Kafka® to insert data into StarRocks #25338
Closed
Replies: 1 comment 1 reply
-
As of right now, this article is a stopgap. For performance, you should pick stream load over "sql insert". |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
One of the most common use cases for our users is the ability to perform real-time CDC from a source database into StarRocks. One of the ways is to use Apache Kafka. Once the data in into a Kafka topic, you can use a Kafka Connector can sink it into your target database. For this tutorial we will use the Aiven's open source JDBC Sink Connector for Apache Kafka (https://github.com/aiven/jdbc-connector-for-apache-kafka) to sink the data into StarRocks.
Note
Sept 2023 update: StarRocks released a StarRocks Kafka Connector. See https://docs.starrocks.io/en-us/latest/loading/Kafka-connector-starrocks for details.
Follow the steps at the Apache Kafka quickstart to setup the environment at https://kafka.apache.org/quickstart. Start zookeeper and the kafka server. Stop when you get to the Kafka Connect step and then follow the below steps.
Create file
config/connect-jdbc-starrocks-sink.properties
Edit the file
config/connect-standalone.properties
Compile or download the JDBC Connector jar (https://github.com/aiven/jdbc-connector-for-apache-kafka/releases/tag/v6.8.0) and the mysql driver jar (https://downloads.mysql.com/archives/c-j/). Here is the directory listing of plugins
Running Kafka Connect
This is the JSON we will be using. It's the same as in the JSON example in the "Loading JSON and AVRO data from Confluent Cloud Kafka into StarRocks" tutorial at #22791.
Database and table create command for StarRocks
Here's how the JSON looks expanded. When creating the schema, cross check the object types at https://kafka.apache.org/0100/javadoc/org/apache/kafka/connect/data/Schema.Type.html. There is also good documentation at https://github.com/aiven/jdbc-connector-for-apache-kafka/blob/master/docs/sink-connector.md
Submitting the JSON into the Kafka producer.
Note: You have to submit the JSON as a single line or else you'll get a parse error from Kafka (see Aiven-Open/jdbc-connector-for-apache-kafka#240).
The kafka console consumer should show this.
You should see this in the Kafka Connect console output.
Now check if the data has been inserted.
Beta Was this translation helpful? Give feedback.
All reactions