You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this fantastic daskhub chart. Myself and @saschahofmann has just set this up on Google Kubernetes Engine. I thought it would be helpful to share our experience, gotchas and potentially suggest improvements to the docs.
Making the decision between charts
We started with the simpler "dask" helm chart.
However it didn't really suit our need for a few reasons:
The jupyter notebook has no persistent disk storage
We wanted the ability to control the number of workers, without making updates via helm
It would have been helpful to have these limitations listed on the README, especially the persistence one which I'd imagine most people would expect (I guess will #78 fix it though).
Setting up daskhub
Setting up the initial "daskhub" worked fine, out of the box. The one part that is a bit flaky for us is the "launch cluster" button on the left sidebar. Sometimes it launches a local cluster, rather than via dask-gateway - we haven't worked out why yet.
Small thing - the quickstart does not quite work for us:
from dask_gateway import GatewayCluster
# This line is missing
gateway = GatewayCluster()
cluster = gateway.new_cluster()
client = cluster.get_client()
Customising the image 🏗️
For our use case, we needed to build a custom image.
The default tag for daskhub is pangeo/base-notebook:{XXX}. Unfortunately, googling that led us to the pangeo stacks Github and webpage and spent time trying to get the "ONBUILD" images to work.
Thankfully we eventually stumbled on this comment, and led us to the correct repo. The "ONBUILD" trick is neat, just took us time to get our head around.
Anyone else stuck doing the same thing, here is what worked for us:
Create a local copy of the pangeo/base-notebook directory here. You minimally need the Dockerfile, apt.txt, environment.yml, postBuild and start.
Modify the apt.txt with system packages, and environment.yml with your extra conda packages
Docker build and push
Set this to the jupyterhub.singleuser.image.{name, tag, pullPolicy}
For the dask custom image, we just used daskgateway/dask-gateway we just did:
# Dockerfile
FROM daskgateway/dask-gateway:0.9.0
COPY ./environment.yml ./environment.yml
RUN conda env update -f environment.yml
# environment.yml
name: base
dependencies:
- # extra packages...
And set this to dask-gateway.gateway.backend.image.{name, tag, pullPolicy}
Making sure to tag specific versions 🤦
A few days ago, we discovered inexplicably that we could no longer build new custom images that worked on the cluster. Turns out we hadn't pinned our images, and dask-gateway 0.9.0 was released.
Obviously completely our fault, but wanted to note that it's essential to have consistent versions for:
The helm chart
The pangeo/base-notebook in your jupyterhub.singleuser Dockerfile
The daskgateway/dask-gateway in your Dockerfile
Otherwise things will break in hard to debug ways. A particularly nasty example was using dask dataframe, where we had pandas 1.1.* on the client and 1.0.* on the workers.
We'd be happy to submit a PR to amend the docs or maybe a separate guide if that's useful, please let us know.
Once again, thanks for building this!
The text was updated successfully, but these errors were encountered:
Thanks for taking the time to raise this! This feedback is much appreciated.
Making the decision between charts
Your decision making comparison is interesting to see, as neither of those are intentional differences. As you say we aim to add persistence in #78, and the Jupyter session created by the dask chart has a service role to enable scaling the workers. This can be done if kubectl or helm are in the image, or via the HelmCluster cluster manager object in Python.
One feature we don't quite have yet is the ability to scale the cluster via the Jupyter sidebar, but that is in the works too.
The primary difference here is the dask chart is intended for use by a single person, whereas the daskhub chart is intended for use by a team or org.
Setting up daskhub
Both the problem you mention around the sidebar not always using Dask Gateway, and the broken docs are bugs. Would you mind raising issues for those?
Customising the image / Making sure to tag specific versions
This would be an awesome bit of documentation if you are interested in raising a PR.
Thank you for this fantastic daskhub chart. Myself and @saschahofmann has just set this up on Google Kubernetes Engine. I thought it would be helpful to share our experience, gotchas and potentially suggest improvements to the docs.
Making the decision between charts
We started with the simpler "dask" helm chart.
However it didn't really suit our need for a few reasons:
It would have been helpful to have these limitations listed on the README, especially the persistence one which I'd imagine most people would expect (I guess will #78 fix it though).
Setting up daskhub
Setting up the initial "daskhub" worked fine, out of the box. The one part that is a bit flaky for us is the "launch cluster" button on the left sidebar. Sometimes it launches a local cluster, rather than via dask-gateway - we haven't worked out why yet.
Small thing - the quickstart does not quite work for us:
Instead we ran:
Customising the image 🏗️
For our use case, we needed to build a custom image.
The default tag for daskhub is
pangeo/base-notebook:{XXX}
. Unfortunately, googling that led us to the pangeo stacks Github and webpage and spent time trying to get the "ONBUILD" images to work.Thankfully we eventually stumbled on this comment, and led us to the correct repo. The "ONBUILD" trick is neat, just took us time to get our head around.
Anyone else stuck doing the same thing, here is what worked for us:
Dockerfile
,apt.txt
,environment.yml
,postBuild
andstart
.apt.txt
with system packages, andenvironment.yml
with your extra conda packagesjupyterhub.singleuser.image.{name, tag, pullPolicy}
For the dask custom image, we just used
daskgateway/dask-gateway
we just did:And set this to
dask-gateway.gateway.backend.image.{name, tag, pullPolicy}
Making sure to tag specific versions 🤦
A few days ago, we discovered inexplicably that we could no longer build new custom images that worked on the cluster. Turns out we hadn't pinned our images, and dask-gateway 0.9.0 was released.
Obviously completely our fault, but wanted to note that it's essential to have consistent versions for:
pangeo/base-notebook
in your jupyterhub.singleuser Dockerfiledaskgateway/dask-gateway
in your DockerfileOtherwise things will break in hard to debug ways. A particularly nasty example was using dask dataframe, where we had pandas 1.1.* on the client and 1.0.* on the workers.
We'd be happy to submit a PR to amend the docs or maybe a separate guide if that's useful, please let us know.
Once again, thanks for building this!
The text was updated successfully, but these errors were encountered: