Personal Data Engineering Projects
-
Updated
Feb 8, 2023 - Jupyter Notebook
Personal Data Engineering Projects
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
Redshift Python Connector. It supports Python Database API Specification v2.0.
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
Build clickstream analytics on AWS for your mobile and web applications
Udacity Data Engineering Nanodegree Program
🔄 🏃 EtLT of my own Strava data using the Strava API, MySQL, Python, S3, Redshift, and Airflow
Project was based on an interest in Data Engineering, ETL pipeline. It also provided a good opportunity to develop skills and experience in a range of tools. As such, project is more complex than required, utilising dbt, airflow, docker and cloud based storage.
An example system that captures a large stream of product usage data, or events, and provides both real-time data visualization and SQL-based data analytics.
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
spring boot data jpa integration with aws redshift sample
This project provides valuable customer sentiment insights for Zomato by tracking and analyzing tweets related to their brand and services.
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
A simple command-line tool to copy tables from Amazon Redshift to Amazon RDS (PostgreSQL).
rdapp - Redshift Data API Postgres Proxy
Add a description, image, and links to the aws-redshift topic page so that developers can more easily learn about it.
To associate your repository with the aws-redshift topic, visit your repo's landing page and select "manage topics."