This is a collection of books and courses I can recommend personally. They are great for every data engineering learner.
I either have used or own these books during my professional work.
I also looked into every online course personally.
If you want to buy a book or course and support my work, please use one of my links below. They are all affiliate marketing links that help me fund this passion.
Of course all this comes at no additional expense to you, but it helps me a lot.
You can find even more interesting books and my whole podcast equipment on my Amazon store:
PS: Don't just get a book and expect to learn everything
- Course certificates alone help you nothing
- Have a purpose in mind, like a small project
- Great for use at work
Learning Java: A Bestselling Hands-On Java Tutorial
Programming Scala: Scalability = Functional Programming + Objects
Learning Swift: Building Apps for macOS, iOS, and Beyond
Learning Spark: Lightning-Fast Big Data Analysis
Kafka Streams in Action: Real-time apps and microservices with the Kafka Streams API
Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale
HBase: The Definitive Guide: Random Access to Your Planet-Size Data
Zero to One: Notes on Startups, or How to Build the Future
Crossing the Chasm, 3rd Edition (Collins Business Essentials)
Crush It!: Why Now Is The Time To Cash In On Your Passion
"In my opinion, the knowledge contained in this book differentiates a data engineer from a software engineer or a developer. The book strikes a good balance between breadth and depth of discussion on data engineering topics, as well as the tradeoffs we must make due to working with massive amounts of data." -- David Lee on LinkedIn
Course name | Course description | Course URL |
---|---|---|
The Bits and Bytes of Computer Networking | This course is designed to provide a full overview of computer networking. We’ll cover everything from the fundamentals of modern networking technologies and protocols to an overview of the cloud to practical applications and network troubleshooting. | https://www.coursera.org/learn/computer-networking |
Learn SQL | Codecademy | In this SQL course, you'll learn how to manage large datasets and analyze real data using the standard data management language. | https://www.codecademy.com/learn/learn-sql |
Learn Python 3 | Codecademy | Learn the basics of Python 3, one of the most powerful, versatile, and in-demand programming languages today. | https://www.codecademy.com/learn/learn-python-3 |
Course name | Course description | Course URL |
---|---|---|
1. Data Engineering Basics | ||
Introduction to Data Engineering | Introduction to Data Engineering with over 1 hour of videos including my journey here. | https://learndataengineering.com/p/introduction-to-data-engineering |
Computer Science Fundamentals | A complete guide of topics and resources you should know as a Data Engineer. | https://learndataengineering.com/p/data-engineering-fundamentals |
Introduction to Python | Learn all the fundamentals of Python to start coding quick | https://learndataengineering.com/p/introduction-to-python |
Python for Data Engineers | Learn all the Python topics a Data Engineer needs even if you don't have a coding background | https://learndataengineering.com/p/python-for-data-engineers |
Docker Fundamentals | Learn all the fundamental Docker concepts with hands-on examples | https://learndataengineering.com/p/docker-fundamentals |
Successful Job Application | Everything you need to get your dream job in Data Engineering. | https://learndataengineering.com/p/successful-job-application |
Data Preparation & Cleaning for ML | All you need for preparing data to enable Machine Learning. | https://learndataengineering.com/p/data-preparation-and-cleaning-for-ml |
2. Platform & Pipeline Design Fundamentals | ||
Data Platform And Pipeline Design | Learn how to build data pipelines with templates and examples for Azure, GCP and Hadoop. | https://learndataengineering.com/p/data-pipeline-design |
Platform & Pipelines Security | Learn the important security fundamentals for Data Engineering | https://learndataengineering.com/p/platform-pipeline-security |
Choosing Data Stores | Learn the different types of data stores and when to use which. | https://learndataengineering.com/p/choosing-data-stores |
Schema Design Data Stores | Learn how to design schemas for SQL, NoSQL and Data Warehouses. | https://learndataengineering.com/p/data-modeling |
3. Fundamental Tools | ||
Building APIs with FastAPI | Learn the fundamentals of designing, creating and deploying APIs with FastAPI and Docker | https://learndataengineering.com/p/apis-with-fastapi-course |
Apache Kafka Fundamentals | Learn the fundamentals of Apache Kafka | https://learndataengineering.com/p/apache-kafka-fundamentals |
Apache Spark Fundamentals | Apache Spark quick start course in Python with Jupyter notebooks, DataFrames, SparkSQL and RDDs. | https://learndataengineering.com/p/learning-apache-spark-fundamentals |
Data Engineering on Databricks | Everything you need to get started with Databricks. From setup to building ETL pipelines & warehousing. | https://learndataengineering.com/p/data-engineering-on-databricks |
MongoDB Fundamentals | Learn how to use MongoDB | https://learndataengineering.com/p/mongodb-fundamentals-course |
Log Analysis with Elasticsearch | Learn how to monitor and debug your data pipelines | https://learndataengineering.com/p/log-analysis-with-elasticsearch |
Airflow Workflow Orchestration | Learn how to orchestrate your data pipelines with Apache Airflow | https://learndataengineering.com/p/learn-apache-airflow |
Snowflake for Data Engineers | Everything you need to get started with Snowflake | https://learndataengineering.com/p/snowflake-for-data-engineers |
dbt for Data Engineers | Everything you need to work with dbt and Snowflake | https://learndataengineering.com/p/dbt-for-data-engineers |
4. Full Hands-On Example Projects | ||
Data Engineering on AWS | Full 5 hours course with complete example project. Building stream and batch processing pipelines on AWS. | https://learndataengineering.com/p/data-engineering-on-aws |
Data Engineering on Azure | Ingest, Store, Process, Serve and Visualize Streams of Data by Building Streaming Data Pipelines in Azure. | https://learndataengineering.com/p/build-streaming-data-pipelines-in-azure |
Data Engineering on GCP | Everything you need to start with Google Cloud. | https://learndataengineering.com/p/data-engineering-on-gcp |
Modern Data Warehouses & Data Lakes | How to integrate a Data Lake with a Data Warehouse and query data directly from files | https://learndataengineering.com/p/modern-data-warehouses |
Machine Learning & Containerization On AWS | Build a app that analyzes the sentiment of tweets and visualizing them on a user interface hosted as container | https://learndataengineering.com/p/ml-on-aws |
Contact Tracing with Elasticsearch | Track 100,000 users in San Francisco using Elasticsearch and an interactive Streamlit user interface | https://learndataengineering.com/p/contact-tracing-with-elasticsearch |
Document Streaming Project | Document Streaming with FastAPI, Kafka, Spark Streaming, MongoDB and Streamlit | https://learndataengineering.com/p/document-streaming |
Storing & Visualizing Time Series Data with InfluxDB and Grafana | Learn how to use InfluxDB to store time series data and visualize interactive dashboards with Grafana | https://learndataengineering.com/p/time-series-influxdb-grafana |
Data Engineering with Hadoop | Hadoop Project with HDFS, YARN, MapReduce, Hive and Sqoop! | https://learndataengineering.com/p/data-engineering-with-hadoop |
Dockerized ETL | Learn how quickly set up a simple ETL script with AWS TDengine & Grafana | https://learndataengineering.com/p/timeseries-etl-with-aws-tdengine-grafana |
Here's a list of great certifications that you can do on AWS and Azure. We left out GCP here, because the adoption of AWS and Azure is a lot higher and that's why I recommend to start with one of these. The costs are usually for doing the certification tests. We also added the level and prerequisites to make it easier for you make the decision which one fits for you.
Platform | Certification Name | Price | Level | Prerequisite Experience | URL |
---|---|---|---|---|---|
AWS | AWS Certified Cloud Practitioner (maybe) | 100 | Beginner | Familiarity with the AWS platform is recommended but not required. | Link |
AWS | AWS Certified Solutions Architect | 300 | Expert | AWS Certified Solutions Architect - Professional is intended for individuals with two or more years of hands-on experience designing and deploying cloud architecture on AWS. | Link |
AWS | AWS Certified Solutions Architect | 150 | Intermediate | This is an ideal starting point for candidates with AWS Cloud or strong on-premises IT experience. This exam does not require deep hands-on coding experience, although familiarity with basic programming concepts would be an advantage. | Link |
AWS | AWS Certified Data Engineer | 150 | Intermediate | The ideal candidate for this exam has the equivalent of 2-3 years of experience in data engineering or data architecture and a minimum of 1-2 years of hands-on experience with AWS services. | Link |
Azure | Microsoft Certified: Azure Cosmos DB Developer Specialty | 165 | Intermediate | Link | |
Azure | Microsoft Certified: Azure Data Engineer Associate - DP 203 | 165 | Intermediate | Link | |
Azure | Microsoft Certified: Azure Data Fundamentals | 99 | Beginner | Link | |
Azure | Microsoft Certified: Azure Database Administrator Associate | 165 | Intermediate | Link | |
Azure | Microsoft Certified: Azure Developer Associate | 165 | Intermediate | Link | |
Azure | Microsoft Certified: Azure Fundamentals | 99 | Beginner | Link | |
Azure | Microsoft Certified: Azure Solutions Architect Expert | 165 | Expert | Microsoft Certified: Azure Administrator Associate certification | Link |
Azure | Microsoft Certified: Fabric Analytics Engineer Associate | 165 | Intermediate | Link | |
Azure | Microsoft Certified: Fabric Data Engineer Associate | 165 | Intermediate | Link | |
Azure | Microsoft Certified: Power BI Data Analyst Associate | 165 | Intermediate | Link |
Top five podcasts by the number of episodes created.