Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete pools and images prior to recreating instead of failing #60

Closed
zsusswein opened this issue Oct 3, 2024 · 5 comments · Fixed by #67
Closed

Delete pools and images prior to recreating instead of failing #60

zsusswein opened this issue Oct 3, 2024 · 5 comments · Fixed by #67
Assignees

Comments

@zsusswein
Copy link
Collaborator

zsusswein commented Oct 3, 2024

#54 sets up a naming scheme that tags images and pools with the PR name (if a PR) and latest (if main). This strategy fails if the pool already exists.

image

There are a couple of options here:

  • We could have the action check if the pool already exists and skip pool creation if it does. This would potentially work if the pool is appropriately linked to the image (Link built pool to image in ACR #59) and also the pool is set to download the image fresh when bringing the node up. This setting is not the default and something we had to dig out of the docs in cfa-nnh-pipelines
  • Alternatively, we could have a step to delete the pool if it exists before setting the pool. This would need to implement polling (probably via az batch pool show). We need to make sure any nodes in the pool drain before recreating and this can take some time. (this is my preferred option but open to the previous)
  • We could switch up the naming scheme and punt the deletion issue to Clean up on PR close #62. This would fix everything but main/latest at the cost of creating lots of pools with no deletion strategy.
  • We could do something else? Maybe a different naming scheme that doesn't require overwriting?

cc: @gvegayon

@zsusswein
Copy link
Collaborator Author

From @jkislin:

Amit said it makes more sense to have a static name (i.e. cfa-epinow2-pool) and just refresh the pool each run. We can talk on this soon. I told him we're working on cleaning up the old pool with a job, which he suggested is ok... but that just having one name for the pool is probably standard/best practice.

I think this is option (1) above? But not totally sure.

@gvegayon
Copy link
Member

gvegayon commented Oct 8, 2024

My suggestion (following @zsusswein) is to check if the pool exists (one pool per branch). If the pool exists, use something like this to update the container:

az batch pool update \
    --pool-id <your_pool_id> \
    --virtual-machine-configuration image-reference.virtual-machine-image-id <new_container_image_url>

Cleaning the pool should happen either manually or once the branch is deleted.

@natemcintosh
Copy link
Collaborator

natemcintosh commented Oct 8, 2024

az batch pool update

I like this idea, but I can't find such a command in the docs.

I think that my preferred method at this point is option 2. I would see if it currently exists with az batch pool list, delete it if it does exist with az batch pool delete, then re-create it with az batch pool create.

I know this could take a while if there are nodes, but if it's running in CI, then I don't think it's that much of an issue. I know from personal experience that this process can be pretty quick (<10s) assuming there are no nodes currently in the pool.

@gvegayon
Copy link
Member

gvegayon commented Oct 9, 2024

You probably didn't find it, @natemcintosh, because AI is hallucinating xD. I'll see if there's anything real about it; if so, we can use it! If not, I agree with you that we need to use those three sets of commands. To avoid breaking things, I would add that we should include a label on the commit message to actively (continuously) delete the pool (of the sort of [skip ci]).

@gvegayon gvegayon linked a pull request Oct 9, 2024 that will close this issue
@gvegayon
Copy link
Member

gvegayon commented Oct 9, 2024

PR is now working here:

  • Pools created using branch name (only if doesn't exists).
  • Deletion of pools via commit msg adding the tag [delete pool]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants