Here are some notebooks that will guide us in learning data privacy and security topics in relation to data engineering. These are included as a starting point to help you along your journey in learning Data Engineering as part of the Data Derp exercises!
When do we need to worry about privacy as we move data across an organization? How can we enable private-by-design pipelines? When should we get in touch with security peers and teammembers to better advise on the data we are moving, transforming and storing?
This notebook / video aims to start approaching these questions as you learn data engineering basics. Unfortunately, there is no cookie-cutter answer, and many of these questions you will ask again and again as you deepen your knowledge and experience in data work. Let this section of your training be an open invitation to think on these principles and get to know them better via your work!
- Security and Privacy in Data Engineering (follow the Quickstart guide and follow along with the video)
- Use the example Databricks notebook to get started, try to recreate the same steps in Spark, how would you do it?
- If you get stuck or want to compare notes, please look at the hints in the solutions folder. There is a notebook with one way to solve the same steps (see: Security and Privacy in Data Engineering - Spark Version).
- Generating Example Data notebook (this should not be required to use and has additional software dependencies)
-
Set up a Databricks Account if you don't already have one
-
Create a cluster if you don't already have one
-
In your User's workspace, click import
-
Import the
Security and Privacy in Data Engineering.dbc
notebook using the URL method:https://github.com/data-derp/exercise-data-security/blob/e5da49ac302dc7ed25107f786f982d53ff192db0/Security%20and%20Privacy%20in%20Data%20Engineering.dbc?raw=true
-
Follow along with the video.
-
Clone your notebook and convert the steps to Spark!
- Practical Data Privacy and accompanying repository
- Introduction to Anonymization via Differential Privacy
- OpenMined: Foundations of Private Computation
Questions about getting set up or the content covered in the notebooks or book? Feel free to reach out via email at: katharine (at) kjamistan (dot) com