changing the code for launching workers (and shutting them down?) #29

1fish2 · 2019-12-10T22:34:25Z

Here's the incremental plan I have in mind (open to change). Meanwhile, we can use the current code (albeit inconsistent between wcEcoli and the Gaia Python client) unless/until there are other clients besides wcEcoli.

I'll add a requested-worker-count property to Workflow properties wcEcoli#755 . (The workflow builder's client and user are in a good position to decide how many workers to allocate.)
We add Gaia code to be able to launch workers via the GCE API. This will be better than pushing so hard on shell scripts. It'll need some parameters sent from the client that are currently in wcEcoli's runscripts/cloud/launch-workers.sh, some parameters that Gaia can get from gcloud (I further configured it on gaia-prime), and some added to its config file. This is easy.
As an interim step, maybe add a Gaia endpoint to launch workers, change the Gaia python client to use it, call that from the workflow builder, and dump both shell scripts. Or skip this step.
Make the Gaia server in charge of when to launch workers, which is whenever it starts or resumes running a workflow. With the requested-worker-count it doesn't have to decide how many.
- The main advantage of this step is resuming a workflow without the user having to know to launch workers.
- This step might require changing the way workers shut down or making Gaia monitor them because the timeouts won't fit every situation, e.g. if most workers time out while waiting for one worker to finish a long task, the workflow might need more workers afterwards.
Another day we make the Gaia server able to decide how many workers to launch, perhaps with more hints from the workflow.

The text was updated successfully, but these errors were encountered:

prismofeverything · 2019-12-10T23:15:56Z

Awesome, thanks for getting this down in writing.

Sounds good.
Planning on this, was going to do it next I think.
Might be nice to have a Gaia endpoint for launching workers in general? However it is implemented, the client wouldn't have to care really. But that sounds better than having multiple repos responsible for launching workers.
Agreed, the more burden we can take off the end user the better. Ultimately having Gaia do this step is the best outcome.
Yes, leading into having workers with different resource requirements eventually.

Thanks for breaking it down! Looks like a plan.

1fish2 added the enhancement New feature or request label Dec 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changing the code for launching workers (and shutting them down?) #29

changing the code for launching workers (and shutting them down?) #29

1fish2 commented Dec 10, 2019

prismofeverything commented Dec 10, 2019

changing the code for launching workers (and shutting them down?) #29

changing the code for launching workers (and shutting them down?) #29

Comments

1fish2 commented Dec 10, 2019

prismofeverything commented Dec 10, 2019