Source code for the following series of blog posts:
- Hydrating a Data Lake using Query-based CDC with Apache Kafka Connect and Kubernetes on AWS
- Hydrating a Data Lake using Log-based Change Data Capture (CDC) with Debezium, Apicurio, and Kafka Connect on AWS
- Getting Started with Spark Structured Streaming and Kafka on AWS using Amazon MSK and Amazon EMR
- Stream Processing with Apache Spark, Kafka, Avro, and Apicurio Registry on Amazon EMR and Amazon MSK
- Working with Apache Avro files in Amazon S3
- Building Open Data Lakes: Debezium, Apache Kafka, Hudi, Spark, and Hive on AWS
- The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and Debezium
The contents of this repository represent my viewpoints and not of my past or current employers, including Amazon Web Services (AWS). All third-party libraries, modules, plugins, and SDKs are the property of their respective owners.