Exploring Data Lakes CDC, Apache Kafka, and Apache Hudi on AWS

Source code for the following series of blog posts:

Hydrating a Data Lake using Query-based CDC with Apache Kafka Connect and Kubernetes on AWS
Hydrating a Data Lake using Log-based Change Data Capture (CDC) with Debezium, Apicurio, and Kafka Connect on AWS
Getting Started with Spark Structured Streaming and Kafka on AWS using Amazon MSK and Amazon EMR
Stream Processing with Apache Spark, Kafka, Avro, and Apicurio Registry on Amazon EMR and Amazon MSK
Working with Apache Avro files in Amazon S3
Building Open Data Lakes: Debezium, Apache Kafka, Hudi, Spark, and Hive on AWS
The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and Debezium

The contents of this repository represent my viewpoints and not of my past or current employers, including Amazon Web Services (AWS). All third-party libraries, modules, plugins, and SDKs are the property of their respective owners.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.idea		.idea
avro_samples		avro_samples
diagrams		diagrams
helm		helm
hudi		hudi
kafka-connect-image		kafka-connect-image
pyspark		pyspark
sql-scripts		sql-scripts
.gitignore		.gitignore
LICENSE		LICENSE
Notes.md		Notes.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Data Lakes CDC, Apache Kafka, and Apache Hudi on AWS

About

Releases

Packages

Languages

License

kennyschuoler/kafka-connect-msk-demo

Folders and files

Latest commit

History

Repository files navigation

Exploring Data Lakes CDC, Apache Kafka, and Apache Hudi on AWS

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages