Learn to deploy end-to-end Serverless Data Engineering Pipelines on GCP via the most comprehensive and FREE online course.
This repo contains the code for the TuraLabs Data Engineering Bootcamp(DEB). Code scaffolds and starting data mentioned in the DEB lessons are contained in this repo.
This course immediately starts covering mid to high level topics. Therefore, we strongly recommend that learners have some experience with Python and SQL. For a more in depth explanation of what Pre-requisites are expected and a list of resrouces to bring you up to speed, please visit our blog post on Helpful Resources to Prep for this Course.
The DEB course uses Python
and Google Cloud Platform(GCP)
tools. Please follow the instructions in our Getting Started Guide to make sure dev environment is properly set up and compatible with our course. If you have any issues getting your dev environment up, pleaes visit our Discord Channel to talk to us.
We're here to help! If you have any questions, please connect with us on our Discord Channel and one of us would be happy to help you!
If you have any suggestions for the course or website, please feel free to open a GitHub Issue within this repo. We also welcome suggestions in the suggestion channel on our Discord Server.
Chapter 3
Chapter 1
- fixed broken links
- fixed typos
- reworded for clarity
Chapter 3
- Add Chapter 3 Episode 2: Getting Started with Data Proc
- Add Chapter 3 Episode 3: Expand and Lookup Passenger Data
Chapter 1
-Fixed typos in chapter 1 overview (thank you Senad)
Chapter 2 -Added GCS source file download instructions to chapter 2 episode 2 (thank you Jason) -Fixed API registration link in chapter 2 episode 4 -Updated Portman API documentation links in chapter 2 episode 4 -Enhanced chapter 2 episode 5 webapp
Chapter 3
Blog -Added Spark Explained blog post
General
- Fixed broken link this GitHub issue
- Fixed typos
Website
- Add Curriculum Overview page (/deb-info)
- Add Registration page (/register)
- Add About Us page (/about)
Blog
- Add “What is Data Engineering? How is it different from Data Science?” Blog post
General
- Switching Slack to Discord
- Fixed broken links to GCP Console and external documents
Ch1
New Ch1Ep5 lesson for advanced pandas use to replace the aircraft dataset to the latest FAA records
Website
General
- fix broken links
Ch1
- add introductory episode on Pandas and Jupyter Notebook for beginners https://dev-dot-turalabs-site.uc.r.appspot.com/docs/ch1/c1e5
- add note to clarify continuous use of the same GCS bucket throughout chapter
Ch 2
- add note in Episode 5 pointing towards Slack channel if you have any questions running a React App
Blog
- add Pandas and Jupyter Notebook episode as a standalone blog post
DEB Repo
Ch 2
- updated API and webapp for end of chapter to fix CORS issue
- updated flights API request based on new query syntax
- updated webapp READMEs to refer to Getting Started docs to acquire service account key
General
- fix typos
- fix broken links
- WSL User Setup
- point to windows WSL2 initial setup
- WSL vs WSL2: https://docs.microsoft.com/en-us/windows/wsl/compare-versions
- Mention install ubuntu 20.04 from MS Store
- fix using python3.7
- add instructions to install python3.7 from deadsnake repos
- adding pip to PATH: home/{username}/.local/bin to my PATH in Ubuntu in order to get access to pip
- fix installing and setting up pip3 and virtualenv
- fix creating a new virtualenv
- remove --no-site-packages from virtualenv instructions
- add instructions for deactivate
Chapter 1
-change paths in code examples to reflect location of data in provided repo