Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify default address to look for scheduler #239

Open
mrocklin opened this issue Jul 16, 2022 · 13 comments
Open

Specify default address to look for scheduler #239

mrocklin opened this issue Jul 16, 2022 · 13 comments

Comments

@mrocklin
Copy link
Member

So, I'm in an interesting situation where I'm running a Jupyter server and I know that it will have exactly one Dask cluster attached to it. I would like to populate the Dask labextension with that scheduler address on startup. Is this easy to do?

@ian-r-rose
Copy link
Collaborator

This is already possible today using the defaultURL setting value. Doing this as part of a deployment would look like:

  1. Identify the relevant server address
  2. Prior to users loading the page (not necessarily prior to the jupyter server startup, but might as well be), put the setting value in an overrides.json file for JupyterLab to pick up. This could be baked in to the environment if it's a stable URL, or done as part of some setup script.

Clearly, I should put a bit of effort into docs here...

@mrocklin
Copy link
Member Author

mrocklin commented Jul 18, 2022 via email

@ian-r-rose
Copy link
Collaborator

Not with the current design -- the default URL to populate the search bar with is decided on the frontend, and feeding information to that goes through the config system (i.e., env variables aren't directly visible to the frontend).

Is there an issue with writing a small config file in that case, or is it just more convenient to set an env variable?

@mrocklin
Copy link
Member Author

So I would do something like the following before starting up the Jupyter server?

with open("overrides.json", mode="w") as f:
    f.write(json.dumps(...))

@ian-r-rose
Copy link
Collaborator

Yes, something like that, at least for a proof-of-concept. A more complete solution might be to use json5 and merge with other possible config options.

To be clear, we could have some kind of translation layer between the dask config system and the JupyterLab one, but we'd have to build it. I'm a little reticent to build out a new set of special-case environment variable rather than go through the existing path. I know that some JupyterHubs/QHubs/2i2c-deployments also have needs to distribute custom settings.

@ian-r-rose
Copy link
Collaborator

The frontend chooses in order:

  1. Any user-populated URL (which is persisted between page refreshes)
  2. The default URL from the settings

I also noticed when kicking the tires on this that the user-populated URL can be a bit too sticky at the moment (you can reset it with a ?reset query parameter). A fix for that is pretty straightforward here.

@mrocklin
Copy link
Member Author

@ian-r-rose and I spoke. There is some possibility of using the system that currently sends the default at-start-time clusters up to the frontend. This is low enough priority though that we're going to wait until jupyter-on-dask becomes more of a major thing (maybe never).

@jacobtomlinson
Copy link
Member

If we switched out the internals for dask-ctl this would be handled automatically by the cluster discovery. Discovered clusters would be listed automatically in the sidebar. xref #189

@mrocklin
Copy link
Member Author

mrocklin commented Jul 19, 2022 via email

@ian-r-rose
Copy link
Collaborator

we don't have any Cluster objects, just a scheduler address

I am not sure that this would be insurmountable in a refactor to use dask-ctl. Today, the sidebar in some sense owns the clusters listed there, and they are backed by real Cluster instances. But if we can, I'd love to get out of the business of having a Cluster backed object all-together, and just have something like "here is a list of clusters we know how to connect to". In that case maybe an address (+ some related metadata?) would be enough.

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jul 25, 2022

@mrocklin that should be fine. dask_ctl.ProxyCluster fulfils the Cluster API and is useful for representing clusters that can't be rehydrated into other cluster manager objects. Currently, the discovery method for ProxyCluster has a look through open ports on localhost and if it finds schedulers it returns them. So classes like LocalCluster and SSHCluster can be included in the list. It would be very quick to expand this to include other addresses configured in the environment like the DASK_SCHEDULER_ADDRESS.

But if we can, I'd love to get out of the business of having a Cluster backed object all-together

I've been down the same thought process too. The trouble is the cluster objects are generally the only place that we can actually represent the abstract concept of a cluster, Dask Gateway and the Dask Kubernetes Operator both have other ways to store and represent this internally, but most other deployment mechanism's don't. My goal with ProxyCluster is to hold this representation in a catch-all way for clusters that aren't easily put back into their original classes.

@ian-r-rose
Copy link
Collaborator

My goal with ProxyCluster is to hold this representation in a catch-all way for clusters that aren't easily put back into their original classes.

This seems it could bee a good solution -- thanks for the explanation @jacobtomlinson. I'll see if I can put together an example using dask/distributed#6737 and ProxyCluster.

I'm getting more excited about the possibility of integrating dask-ctl here

@Lesterpig
Copy link

I'm also interested in providing a default address.

I tried the following in overrides.json but it doesn't seem to work. Maybe I'm using the wrong plugin name?

{
        "dask-labextension:plugin": {
                "hideClusterManager": true,
                "defaultURL": "<hidden>"
        }
}

Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants