Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changing the code for launching workers (and shutting them down?) #29

Open
1fish2 opened this issue Dec 10, 2019 · 1 comment
Open

changing the code for launching workers (and shutting them down?) #29

1fish2 opened this issue Dec 10, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@1fish2
Copy link
Collaborator

1fish2 commented Dec 10, 2019

Here's the incremental plan I have in mind (open to change). Meanwhile, we can use the current code (albeit inconsistent between wcEcoli and the Gaia Python client) unless/until there are other clients besides wcEcoli.

  1. I'll add a requested-worker-count property to Workflow properties wcEcoli#755 . (The workflow builder's client and user are in a good position to decide how many workers to allocate.)
  2. We add Gaia code to be able to launch workers via the GCE API. This will be better than pushing so hard on shell scripts. It'll need some parameters sent from the client that are currently in wcEcoli's runscripts/cloud/launch-workers.sh, some parameters that Gaia can get from gcloud (I further configured it on gaia-prime), and some added to its config file. This is easy.
  3. As an interim step, maybe add a Gaia endpoint to launch workers, change the Gaia python client to use it, call that from the workflow builder, and dump both shell scripts. Or skip this step.
  4. Make the Gaia server in charge of when to launch workers, which is whenever it starts or resumes running a workflow. With the requested-worker-count it doesn't have to decide how many.
    • The main advantage of this step is resuming a workflow without the user having to know to launch workers.
    • This step might require changing the way workers shut down or making Gaia monitor them because the timeouts won't fit every situation, e.g. if most workers time out while waiting for one worker to finish a long task, the workflow might need more workers afterwards.
  5. Another day we make the Gaia server able to decide how many workers to launch, perhaps with more hints from the workflow.
@1fish2 1fish2 added the enhancement New feature or request label Dec 10, 2019
@prismofeverything
Copy link
Member

Awesome, thanks for getting this down in writing.

  1. Sounds good.
  2. Planning on this, was going to do it next I think.
  3. Might be nice to have a Gaia endpoint for launching workers in general? However it is implemented, the client wouldn't have to care really. But that sounds better than having multiple repos responsible for launching workers.
  4. Agreed, the more burden we can take off the end user the better. Ultimately having Gaia do this step is the best outcome.
  5. Yes, leading into having workers with different resource requirements eventually.

Thanks for breaking it down! Looks like a plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants