Skip to content

Latest commit

 

History

History
265 lines (186 loc) · 30.3 KB

09-BooksAndCourses.md

File metadata and controls

265 lines (186 loc) · 30.3 KB

Recommended Books, Courses, and Podcasts

Contents

About Books, Courses, and Podcasts

This is a collection of books and courses I can recommend personally. They are great for every data engineering learner.

I either have used or own these books during my professional work.

I also looked into every online course personally.

If you want to buy a book or course and support my work, please use one of my links below. They are all affiliate marketing links that help me fund this passion.

Of course all this comes at no additional expense to you, but it helps me a lot.

You can find even more interesting books and my whole podcast equipment on my Amazon store:

Go to the Amazon store

PS: Don't just get a book and expect to learn everything

  • Course certificates alone help you nothing
  • Have a purpose in mind, like a small project
  • Great for use at work

Books

Languages

Java

Learning Java: A Bestselling Hands-On Java Tutorial

Python

Learning Python, 5th Edition

Scala

Programming Scala: Scalability = Functional Programming + Objects

Swift

Learning Swift: Building Apps for macOS, iOS, and Beyond

Data Science Tools

Apache Spark

Learning Spark: Lightning-Fast Big Data Analysis

Apache Kafka

Kafka Streams in Action: Real-time apps and microservices with the Kafka Streams API

Apache Hadoop

Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale

Apache HBase

HBase: The Definitive Guide: Random Access to Your Planet-Size Data

Business

The Lean Startup

The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses

Zero to One

Zero to One: Notes on Startups, or How to Build the Future

The Innovators Dilemma

The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail (Management of Innovation and Change)

Crossing the Chasm

Crossing the Chasm, 3rd Edition (Collins Business Essentials)

Crush It!

Crush It!: Why Now Is The Time To Cash In On Your Passion

Community Recommendations

Designing Data-Intensive Applications

"In my opinion, the knowledge contained in this book differentiates a data engineer from a software engineer or a developer. The book strikes a good balance between breadth and depth of discussion on data engineering topics, as well as the tradeoffs we must make due to working with massive amounts of data." -- David Lee on LinkedIn

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Online Courses

Preparation courses

Course name Course description Course URL
The Bits and Bytes of Computer Networking This course is designed to provide a full overview of computer networking. We’ll cover everything from the fundamentals of modern networking technologies and protocols to an overview of the cloud to practical applications and network troubleshooting. https://www.coursera.org/learn/computer-networking
Learn SQL | Codecademy In this SQL course, you'll learn how to manage large datasets and analyze real data using the standard data management language. https://www.codecademy.com/learn/learn-sql
Learn Python 3 | Codecademy Learn the basics of Python 3, one of the most powerful, versatile, and in-demand programming languages today. https://www.codecademy.com/learn/learn-python-3

Data engineering courses

Course name Course description Course URL
1. Data Engineering Basics
Introduction to Data Engineering Introduction to Data Engineering with over 1 hour of videos including my journey here. https://learndataengineering.com/p/introduction-to-data-engineering
Computer Science Fundamentals A complete guide of topics and resources you should know as a Data Engineer. https://learndataengineering.com/p/data-engineering-fundamentals
Introduction to Python Learn all the fundamentals of Python to start coding quick https://learndataengineering.com/p/introduction-to-python
Python for Data Engineers Learn all the Python topics a Data Engineer needs even if you don't have a coding background https://learndataengineering.com/p/python-for-data-engineers
Docker Fundamentals Learn all the fundamental Docker concepts with hands-on examples https://learndataengineering.com/p/docker-fundamentals
Successful Job Application Everything you need to get your dream job in Data Engineering. https://learndataengineering.com/p/successful-job-application
Data Preparation & Cleaning for ML All you need for preparing data to enable Machine Learning. https://learndataengineering.com/p/data-preparation-and-cleaning-for-ml
2. Platform & Pipeline Design Fundamentals
Data Platform And Pipeline Design Learn how to build data pipelines with templates and examples for Azure, GCP and Hadoop. https://learndataengineering.com/p/data-pipeline-design
Platform & Pipelines Security Learn the important security fundamentals for Data Engineering https://learndataengineering.com/p/platform-pipeline-security
Choosing Data Stores Learn the different types of data stores and when to use which. https://learndataengineering.com/p/choosing-data-stores
Schema Design Data Stores Learn how to design schemas for SQL, NoSQL and Data Warehouses. https://learndataengineering.com/p/data-modeling
3. Fundamental Tools
Building APIs with FastAPI Learn the fundamentals of designing, creating and deploying APIs with FastAPI and Docker https://learndataengineering.com/p/apis-with-fastapi-course
Apache Kafka Fundamentals Learn the fundamentals of Apache Kafka https://learndataengineering.com/p/apache-kafka-fundamentals
Apache Spark Fundamentals Apache Spark quick start course in Python with Jupyter notebooks, DataFrames, SparkSQL and RDDs. https://learndataengineering.com/p/learning-apache-spark-fundamentals
Data Engineering on Databricks Everything you need to get started with Databricks. From setup to building ETL pipelines & warehousing. https://learndataengineering.com/p/data-engineering-on-databricks
MongoDB Fundamentals Learn how to use MongoDB https://learndataengineering.com/p/mongodb-fundamentals-course
Log Analysis with Elasticsearch Learn how to monitor and debug your data pipelines https://learndataengineering.com/p/log-analysis-with-elasticsearch
Airflow Workflow Orchestration Learn how to orchestrate your data pipelines with Apache Airflow https://learndataengineering.com/p/learn-apache-airflow
Snowflake for Data Engineers Everything you need to get started with Snowflake https://learndataengineering.com/p/snowflake-for-data-engineers
dbt for Data Engineers Everything you need to work with dbt and Snowflake https://learndataengineering.com/p/dbt-for-data-engineers
4. Full Hands-On Example Projects
Data Engineering on AWS Full 5 hours course with complete example project. Building stream and batch processing pipelines on AWS. https://learndataengineering.com/p/data-engineering-on-aws
Data Engineering on Azure Ingest, Store, Process, Serve and Visualize Streams of Data by Building Streaming Data Pipelines in Azure. https://learndataengineering.com/p/build-streaming-data-pipelines-in-azure
Data Engineering on GCP Everything you need to start with Google Cloud. https://learndataengineering.com/p/data-engineering-on-gcp
Modern Data Warehouses & Data Lakes How to integrate a Data Lake with a Data Warehouse and query data directly from files https://learndataengineering.com/p/modern-data-warehouses
Machine Learning & Containerization On AWS Build a app that analyzes the sentiment of tweets and visualizing them on a user interface hosted as container https://learndataengineering.com/p/ml-on-aws
Contact Tracing with Elasticsearch Track 100,000 users in San Francisco using Elasticsearch and an interactive Streamlit user interface https://learndataengineering.com/p/contact-tracing-with-elasticsearch
Document Streaming Project Document Streaming with FastAPI, Kafka, Spark Streaming, MongoDB and Streamlit https://learndataengineering.com/p/document-streaming
Storing & Visualizing Time Series Data with InfluxDB and Grafana Learn how to use InfluxDB to store time series data and visualize interactive dashboards with Grafana https://learndataengineering.com/p/time-series-influxdb-grafana
Data Engineering with Hadoop Hadoop Project with HDFS, YARN, MapReduce, Hive and Sqoop! https://learndataengineering.com/p/data-engineering-with-hadoop
Dockerized ETL Learn how quickly set up a simple ETL script with AWS TDengine & Grafana https://learndataengineering.com/p/timeseries-etl-with-aws-tdengine-grafana

Certifications

Here's a list of great certifications that you can do on AWS and Azure. We left out GCP here, because the adoption of AWS and Azure is a lot higher and that's why I recommend to start with one of these. The costs are usually for doing the certification tests. We also added the level and prerequisites to make it easier for you make the decision which one fits for you.

Platform Certification Name Price Level Prerequisite Experience URL
AWS AWS Certified Cloud Practitioner (maybe) 100 Beginner Familiarity with the AWS platform is recommended but not required. Link
AWS AWS Certified Solutions Architect 300 Expert AWS Certified Solutions Architect - Professional is intended for individuals with two or more years of hands-on experience designing and deploying cloud architecture on AWS. Link
AWS AWS Certified Solutions Architect 150 Intermediate This is an ideal starting point for candidates with AWS Cloud or strong on-premises IT experience. This exam does not require deep hands-on coding experience, although familiarity with basic programming concepts would be an advantage. Link
AWS AWS Certified Data Engineer 150 Intermediate The ideal candidate for this exam has the equivalent of 2-3 years of experience in data engineering or data architecture and a minimum of 1-2 years of hands-on experience with AWS services. Link
Azure Microsoft Certified: Azure Cosmos DB Developer Specialty 165 Intermediate Link
Azure Microsoft Certified: Azure Data Engineer Associate - DP 203 165 Intermediate Link
Azure Microsoft Certified: Azure Data Fundamentals 99 Beginner Link
Azure Microsoft Certified: Azure Database Administrator Associate 165 Intermediate Link
Azure Microsoft Certified: Azure Developer Associate 165 Intermediate Link
Azure Microsoft Certified: Azure Fundamentals 99 Beginner Link
Azure Microsoft Certified: Azure Solutions Architect Expert 165 Expert Microsoft Certified: Azure Administrator Associate certification Link
Azure Microsoft Certified: Fabric Analytics Engineer Associate 165 Intermediate Link
Azure Microsoft Certified: Fabric Data Engineer Associate 165 Intermediate Link
Azure Microsoft Certified: Power BI Data Analyst Associate 165 Intermediate Link

Podcasts

Top five podcasts by the number of episodes created.

Super Data Science

The latest machine learning, A.I., and data career topics from across both academia and industry are brought to you by host Dr. Jon Krohn on the Super Data Science Podcast.

Data Skeptic

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Roaring Elephant BiteSized Big Tech

A weekly community podcast about Big Technology with a focus on Open Source, Advanced Analytics and other modern magic.

SQL Data Partners Podcast

Hosted by Carlos L Chacon, the SQL Data Partners Podcast focuses on Microsoft data platform related topics mixed with a sprinkling of professional development. Carlos and guests discuss new and familiar features and ideas and how you might apply them in your environments.

Complete list

Host name Podcast name Access podcast
Jon Krohn Super Data Science https://www.superdatascience.com/podcast
Kyle Polich Data Skeptic https://dataskeptic.com/
Tobias Macey Data Engineering Podcast https://www.dataengineeringpodcast.com/
Dave Russell Roaring Elephant - Bite-Sized Big Tech https://roaringelephant.org/
Carlos L Chacon SQL Data Partners Podcast https://sqldatapartners.com/podcast/
Jason Himmelstein BIFocal - Clarifying Business Intelligence https://bifocal.show/
Scott Hirleman Data Mesh Radio https://daappod.com/data-mesh-radio/
Jonathan Schwabish PolicyViz https://policyviz.com/podcast/
Al Martin Making Data Simple https://www.ibm.com/blogs/journey-to-ai/2021/02/making-data-simple-this-week-we-continue-our-discussion-on-data-framework-and-what-is-meant-by-data-framework/
John David Ariansen How to Get an Analytics Job https://www.silvertoneanalytics.com/how-to-get-an-analytics-job/
Moritz Stefaner Data Stories https://datastori.es/
Hilary Parker Not So Standard Deviations https://nssdeviations.com/
Ben Lorica The Data Exchange with Ben Lorica https://thedataexchange.media/author/bglorica/
Juan Sequeda Catalog & Cocktails https://data.world/resources/podcasts/
Wayne Eckerson Secrets of Data Analytics Leaders https://www.eckerson.com/podcasts/secrets-of-data-analytics-leaders
Guy Glantser SQL Server Radio https://www.sqlserverradio.com/
Eitan Blumin SQL Server Radio https://www.sqlserverradio.com/
Jason Tan The Analytics Show https://ddalabs.ai/the-analytics-show/
Hugo Bowne-Anderson DataFramed https://www.datacamp.com/podcast
Kostas Pardalis The Data Stack Show https://datastackshow.com/
Eric Dodds The Data Stack Show https://datastackshow.com/
Catherine King The Business of Data Podcast https://podcasts.apple.com/gb/podcast/the-business-of-data-podcast/id1528796448
The Business of Data https://business-of-data.com/podcasts/
James Le Datacast https://datacast.simplecast.com/
Mike Delgado DataTalk https://podcasts.apple.com/us/podcast/datatalk/id1398548129
Matt Housley Monday Morning Data Chat https://podcasts.apple.com/us/podcast/monday-morning-data-chat/id1565154727
Francesco Gadaleta Data Science at Home https://datascienceathome.com/
Alli Torban Data Viz Today https://dataviztoday.com/
Steve Jones Voice of the DBA https://voiceofthedba.com/
Lea Pica The Present Beyond Measure Show: Data Storytelling, Presentation & Visualization https://leapica.com/podcast/
Samir Sharma The Data Strategy Show https://podcasts.apple.com/us/podcast/the-data-strategy-show/id1515194422
Cindi Howson The Data Chief https://www.thoughtspot.com/data-chief/podcast
Cole Nussbaumer Knaflic storytelling with data podcast https://storytellingwithdata.libsyn.com/
Margot Gerritsen Women in Data Science https://www.widsconference.org/podcast.html
Jonas Christensen Leaders of Analytics https://www.leadersofanalytics.com/episode/the-future-of-analytics-leadership-with-john-thompson
Matt Brady ZUMA: Data For Good https://www.youtube.com/@zuma-dataforgood
Julia Schottenstein The Analytics Engineering Podcast https://roundup.getdbt.com/s/the-analytics-engineering-podcast
Data Unlocked https://dataunlocked.buzzsprout.com/
Boris Jabes The Sequel Show https://www.thesequelshow.com/
Data Radicals https://www.alation.com/podcast/
Nicola Askham The Data Governance https://www.nicolaaskham.com/podcast
Boaz Farkash The Data Engineering Show https://www.dataengineeringshow.com/
Bob Haffner The Engineering Side of Data https://podcasts.apple.com/us/podcast/the-engineering-side-of-data/id1566999533
Dan Linstedt Data Vault Alliance https://datavaultalliance.com/category/news/podcasts/
Dustin Schimek Data Ideas https://podcasts.apple.com/us/podcast/data-ideas/id1650322207
Alex Merced The datanation https://podcasts.apple.com/be/podcast/the-datanation-podcast-podcast-for-data-engineers/id1608638822
Thomas Bustos Let's Talk AI https://www.youtube.com/@lets-talk-ai
Jahanvee Narang Decoding Data Analytics https://www.youtube.com/@decodingdataanalytics/videos