-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prune old develop snapshots #853
base: main
Are you sure you want to change the base?
Prune old develop snapshots #853
Conversation
@zackgalbreath @kwryankrattiger I know we discussed this several times and never really came to agreement about how and when this should be done to be minimally disruptive. I'm just getting the ball rolling on it again with some concrete implementation we can poke at. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added my thought from the other day here.
Thanks for starting on this!
parser.add_argument( | ||
"-m", | ||
"--mirror-root", | ||
default="s3://spack-binaries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base on previous comments related to pruning, I think we should avoid passing production paths as defaults and require them be specified.
|
||
# First, try to delete the mirror associated with the snapshot | ||
try: | ||
subprocess.run(["aws", "s3", "rm", "--recursive", url_to_prune], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An idea for a process for making making sure the cache.spack.io and the mirrors are in sync.
check if mirror has index.json
- if yes -> remove
index.json
- if no -> delete the entire prefix
This will give a buffer time between dropping the tag and deleting the mirror contents. If we run this weekly that should translate to the cache.spack.io page being updated, it uses the the index.json
to create a global index file.
The timeline for snapshot pruning could be:
-- Prune Cron Runs @ 2023/01/01 0100 UTC
- Delete Tag
develop-XXXX
- Delete
develop-XXXX/build_cache/index.json
-- Generate cache.spack.io @ 2023/01/02 0100 UTC
- Create global index
- push new website without mirrors containing no
index.json
-- Prune Cron Runs @ 2023/01/08 0100 UTC
- Delete prefix
develop-XXXX/
py_gh_repo = py_github.get_repo("spack/spack", lazy=True) | ||
|
||
# Get a list of all the tags matching the develop snapshot pattern | ||
snapshot_tags = py_gh_repo.get_git_matching_refs("tags/develop-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method described below would break this query for mirror names.
If we still want to use tags, we could move deleted tags to hidden refs, something like refs/archive/develop-*
. The other option is for each prefix listed in the root, match the name to a regex <prefix>/develop-*/
check for the index.json
to see what to delete.
I am not sure which method to prefer for this, maybe use both, one for listing tags to remove, the other for deleting mirrors.
Possible change which would change how snapshot refs are queried |
Old develop snapshots have been accumulating for about a year now, this PR provides a cronjob to clean up all but the most recent few on a periodic basis.