Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow example for dask databricks #300

Merged
merged 18 commits into from
Jan 24, 2024
Merged

Conversation

skirui-source
Copy link
Contributor

Fixes #298

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@skirui-source skirui-source marked this pull request as ready for review December 2, 2023 07:33
Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skirui-source.

I've made a few specific comments in ReviewNB but also would you address the following high-level things:

  • Can you add tags to the notebook?
  • The build is unhappy. I think it doesn't like the notebook cell metadata for some reason. Perhaps because it was saved in Databricks. Could you open an issue to track this and start looking into a solution?
  • Could you give it another spelling/grammar pass as I noticed a few things but it's hard to suggest edits on GitHub for notebooks?
  • Can you check the consistency on the spelling of things like Dask/RAPIDS, cuDF, etc.

Thanks!

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looks like one comment was marked as resolved but not actually completed. Could you look at adding some more links throughout? Generally whenever we mention a technology like Dask, XGBoost, etc for the first time in a page we should link to it.

Also it looks like the build is still failing

/home/runner/work/deployment/deployment/source/examples/xgboost-dask-databricks/notebook.ipynb:40002: WARNING: skipping unknown output mime type: application/vnd.databricks.v1+bamboolib_hint [mystnb.unknown_mime_type]

Could you fix that up? Thanks!

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the metadata that sphinx was complaining about.

@jacobtomlinson jacobtomlinson merged commit 6fc6f2e into main Jan 24, 2024
4 checks passed
@jacobtomlinson jacobtomlinson deleted the databricks-worklflow branch January 24, 2024 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a workflow example that uses multi-node Databricks and Dask (ideally also using dask-deltatable)
2 participants