Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When crates.io gives 429, cargo should back off and retry later #13530

Open
ijackson opened this issue Mar 4, 2024 · 4 comments
Open

When crates.io gives 429, cargo should back off and retry later #13530

ijackson opened this issue Mar 4, 2024 · 4 comments
Labels
A-interacts-with-crates.io Area: interaction with registries A-networking Area: networking issues, curl, etc. A-registries Area: registries C-bug Category: bug Command-publish S-needs-team-input Status: Needs input from team on whether/how to proceed.

Comments

@ijackson
Copy link
Contributor

ijackson commented Mar 4, 2024

Problem

Our workspace contains 46 cargo packages. (Because cargo insists that each crate must be a separate package, and we want to split up crates for code sanity and compilation time reasons.)

This means that in our recent release, our on-duty release technician hit the rate limit. This aborted publication of the workspace, requiring manual retries and wrangling.

Steps

Have a workspace with more than 30 (the current burst rate limit) crates. Try to publish it by publising each crate, in topo order, with cargo publish (using some automated tool).

Possible Solution(s)

cargo should handle a 429 response by backing off and retrying, using an exponential backoff algorithm.

In rust-lang/crates.io#1643 the crates.io team report already having raised the rate limit. In the error message from crates.io they suggest emailing help@ to ask for a rate limit increase. Such a workflow is IMO undesirable, especially as Rust gets more adoption.

Notes

I don't think increasing the rate limit (globally, or on request) is the right fix. If 429 is a hard error there is a tension between preventing misuse, and not breaking large projects' releases. But this tension can be abolished by handling 429 gracefully.

#13397 would probably have assisted the recovery from this situation (and also the local disk space problem our releasae technician also ran into).

See also: rust-lang/crates.io#3229 (requesting docs) #6714 (requesting better error message display).

Version

> cargo version --verbose                                                                                                                                                                                                                                                                             16:57:11
cargo 1.76.0 (c84b36747 2024-01-18)
release: 1.76.0
commit-hash: c84b367471a2db61d2c2c6aab605b14130b8a31b
commit-date: 2024-01-18
host: x86_64-unknown-linux-gnu
libgit2: 1.7.1 (sys:0.18.1 vendored)
libcurl: 8.5.0-DEV (sys:0.4.70+curl-8.5.0 vendored ssl:OpenSSL/1.1.1w)
ssl: OpenSSL 1.1.1w  11 Sep 2023
os: Arch Linux Rolling Release [64-bit]

(edited to fix ticket links)

@ijackson ijackson added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Mar 4, 2024
@epage epage added A-registries Area: registries A-networking Area: networking issues, curl, etc. Command-publish labels Mar 4, 2024
@epage
Copy link
Contributor

epage commented Mar 4, 2024

People will be more likely to hit this with #1169 (since we'd likely move forward on that without the batch publish on crates.io's side)

cargo release tried to detect rate limitation situations and warn users about them so they can break down the publish into smaller steps.

As for strategies to deal with this, I'd want input from crates.io to know what fits with their intent of the rate limit.

Ideas brought up

. (Because cargo insists that each crate must be a separate package, and we want to split up crates for code sanity and compilation time reasons.)

Technically, packages can contain multiple crates but only one lib crate. See rust-lang/rfcs#3452 for a proposal for a way to explicitly vendor dependencies on publish.

@epage epage added the A-interacts-with-crates.io Area: interaction with registries label Mar 4, 2024
@ijackson
Copy link
Contributor Author

ijackson commented Mar 4, 2024

Ideas brought up

* Back off and retry

* [Batch uploading](https://github.com/rust-lang/crates.io/issues/1643#issuecomment-1120665466)

👍

ISTM that batch uploading is nontrivial. Not only is it a substantial protocol change, but it possibly adds coherency demands to the crates.io system, which may be difficult to fulfil in an ACID way.

I'm guessing that a backoff and retry strategy is likely to be relatively simple. The only question is whether to apply it only to publish (where we know that we want rate limits low enough that reasonable non-abusive use cases can reach them), or all operations.

I think applying it to all operations risks exacerbating operational problems from wayward automation. I don't know if we have non-abusive operations which risk hitting rate limits. (Last week I ran cargo owner add for the same 46 crates and that went smoothly.)

Retrying on 429 only on publish is a conservative choice which would solve the real-world operational problem.

@Eh2406
Copy link
Contributor

Eh2406 commented Mar 5, 2024

Retrying on 429 only on publish is a conservative choice which would solve the real-world operational problem.

That critically depends on what the rate limit is intended to accomplish. If the point of the rate limit is to make sure there is a personal connection between crates.io and it's power users, than any automated fix is just circumventing. Similarly if the expensive part of the operation is receiving and processing the publisher request, then a acceptable retry strategy is just automating the DDOS they were trying to avoid. We should talk to the crates.io team before making technical changes.

It could be that the best compromise here is that cargo has a retry strategy that is ridiculously slow. For example it gets a 429, and prints out a message saying "you're being rate limited please talk to the registry about acceptable use in the future, but for now we are going retry your request After a one minute delay." this reduces the chance of a user intentionally relying on this behaviour, because it's so painfully slow, but also it does not break the automation that assumed that when "cargo publish" finished the crate was published.

@epage epage added S-needs-team-input Status: Needs input from team on whether/how to proceed. and removed S-triage Status: This issue is waiting on initial triage. labels Mar 7, 2024
@ijackson
Copy link
Contributor Author

(This just happened to me again. We have 55 packages now. It was less troublesome this time round because after the discouraging response to #13397 we wrote a python script to publish idempotently,)

It could be that the best compromise here is that cargo has a retry strategy that is ridiculously slow. For example it gets a 429, and prints out a message saying "you're being rate limited please talk to the registry about acceptable use in the future, but for now we are going retry your request After a one minute delay."

This would meet our needs very nicely. Publication of our 55-package workspace takes a fair while in any case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interacts-with-crates.io Area: interaction with registries A-networking Area: networking issues, curl, etc. A-registries Area: registries C-bug Category: bug Command-publish S-needs-team-input Status: Needs input from team on whether/how to proceed.
Projects
None yet
Development

No branches or pull requests

3 participants