Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Content] Add Delta Lake Optimisations #40

Open
12 tasks
kelseymok opened this issue Jul 24, 2023 · 3 comments
Open
12 tasks

[Content] Add Delta Lake Optimisations #40

kelseymok opened this issue Jul 24, 2023 · 3 comments
Assignees

Comments

@kelseymok
Copy link
Contributor

kelseymok commented Jul 24, 2023

There are two new amazing notebooks from Databricks which will fit in very well here. The first one is similar to our demo that already exists, and the second is a new notebook which can be used as bonus material.

This issue pertains to the second notebook (01-Deltalake...). We'll create a new Delta Lake Optimisations exercise (meant to be run after Delta Lake Walkthrough).

  • Download the two notebooks here: Archive.zip

  • Import the 01-Deltalake notebook to Databricks

  • Remove all Databricks-demo specific text that doesn't pertain to our content (e.g. "a cluster has been created for you...")

  • Add our per-user workspace selector and stream helpers: https://github.com/data-derp/small-exercises/blob/master/delta-lake-walkthrough/delta-lake-walkthrough.py#L31-L150

  • Add at the top of the notebook "This notebook is adapted from the Delta Lake Demo provided by Databricks".

  • Write to python source

  • Upload to a new dir in the small-exercises repo called "delta-lake-optimisations"

  • Create a new readme (similar to the delta lake walkthrough - you can get inspired from it, just make sure you change the urls)

  • Create a new section in data-derp called "Exercise: Delta Lake Optimisations (Bonus)" just after the Delta Lake Walkthrough. NOTE: don't put this in the existing delta lake exercise section, create a NEW section

  • Check that the notebook can be imported according to the README instructions

  • Run through both notebooks that there are no bugs and can be run on a fresh cluster.

  • Short video walkthrough explaining the Optimizations thing (modifications made at the metastore level)

@kelseymok
Copy link
Contributor Author

From @syed-tw - this requires UC to be added to the workspaces, which is extra work for us at the moment. Until we sort that out, let's skip this.

Image

@kelseymok
Copy link
Contributor Author

@syed-tw UC has been added to the NEW workspace

@syed-tw
Copy link
Contributor

syed-tw commented Jul 31, 2023

Yep..it works now @kelseymok ..thanks for enabling it !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants