Skip to content

data-derp/exercise-data-security

Repository files navigation

Data Security and Privacy in Data Engineering

Here are some notebooks that will guide us in learning data privacy and security topics in relation to data engineering. These are included as a starting point to help you along your journey in learning Data Engineering as part of the Data Derp exercises!

Motivation

When do we need to worry about privacy as we move data across an organization? How can we enable private-by-design pipelines? When should we get in touch with security peers and teammembers to better advise on the data we are moving, transforming and storing?

This notebook / video aims to start approaching these questions as you learn data engineering basics. Unfortunately, there is no cookie-cutter answer, and many of these questions you will ask again and again as you deepen your knowledge and experience in data work. Let this section of your training be an open invitation to think on these principles and get to know them better via your work!

Outline

Agenda

  • Security and Privacy in Data Engineering (follow the Quickstart guide and follow along with the video)
  • Use the example Databricks notebook to get started, try to recreate the same steps in Spark, how would you do it?
  • If you get stuck or want to compare notes, please look at the hints in the solutions folder. There is a notebook with one way to solve the same steps (see: Security and Privacy in Data Engineering - Spark Version).

Extra Materials

  • Generating Example Data notebook (this should not be required to use and has additional software dependencies)

Quickstart

  1. Set up a Databricks Account if you don't already have one

  2. Create a cluster if you don't already have one

  3. In your User's workspace, click import

    databricks-import

  4. Import the Security and Privacy in Data Engineering.dbc notebook using the URL method: https://github.com/data-derp/exercise-data-security/blob/e5da49ac302dc7ed25107f786f982d53ff192db0/Security%20and%20Privacy%20in%20Data%20Engineering.dbc?raw=true

  5. Select your cluster databricks-select-cluster.png

  6. Follow along with the video.

  7. Clone your notebook and convert the steps to Spark!

Recommended Reading & Further Study

Questions?

Questions about getting set up or the content covered in the notebooks or book? Feel free to reach out via email at: katharine (at) kjamistan (dot) com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published