-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ostree-fetcher-curl: handle non 404 errors as G_IO_ERROR_TIMED_OUT #2843
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong. There are lots of possible curl errors and they have nothing to do with HTTP errors such as 404. See https://curl.se/libcurl/c/libcurl-errors.html. Many of those errors should not be retried. I suppose in most cases there's no harm in retrying, but I think it would make more sense to match specific errors that should be retried.
See the context in this function; this code is only run when libcurl fails for a non-HTTP reason (e.g. TCP level errors). We already have specific error handling for HTTP codes just below this. |
I likely misguided Dan with my commit message talking 404s and http. Thanks for clarifying, will fix the commit msg. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though indeed looking at the list, it does look like there's quite a few things where it doesn't make sense to retry. I think ideally I agree we'd special-case those that do, but also agree it probably doesn't hurt either to be more brute-force.
src/libostree/ostree-fetcher-curl.c
Outdated
@@ -329,7 +329,7 @@ check_multi_info (OstreeFetcher *fetcher) | |||
} | |||
else | |||
{ | |||
g_task_return_new_error (task, G_IO_ERROR, G_IO_ERROR_FAILED, | |||
g_task_return_new_error (task, G_IO_ERROR, G_IO_ERROR_TIMED_OUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably shouldn't retry if is_file
is set?
0a2d4ec
to
435ae57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
How did testing go for this? |
I have not had time to test this, in the last two weeks sorry. Will give it some time this week if the other bugs allow. |
I've rebased 🏄 this on main This issue just came up again in https://gitlab.apertis.org/infrastructure/apertis-issues/-/issues/349 |
I was playing with |
I was also trying To really test this and reproduce the originating issue I think we need a HTTP/2 webserver which is intentionally returning errors. I'm sure it can be done, but I'm having trouble in a quick search finding anything...I would think one of the clients would have a setup for this. Dunno...I was originally going to say we should merge this, but the "pulls just hang on too many errors" effect I discovered is concerning. With this we may potentially switch from erroring out too easily to just hanging indefinitely, which is definitely worse. |
@cgwalters if it hangs forever it's just a bug to be fixed in curl For the HTTP/2 webserver it was an nginx running on Fedora (latest version available when opening #2570), no http2 specific tunings, don't know if it's still happening with newer versions |
If more concrete results are needed before approving this pull request: Among other code changes, I've incorporated this commit to do OSTree updates with aktualizr in an unreliable and low bandwidth connection i.e. download speeds around 10 kB/s with disconnection periods lasting a few minutes. I've incorporated this commit in OSTree tag 2020.3. Before applying it the update would quickly fail after the first few libcurl errors e.g. With it I was able to successfully download a 190 MB OSTree update with the low bandwidth connection described above, taking around 9h to complete. One other thing I added was a timeout ( Hope this information is useful. |
@lucas-akira this is very helpful, thank you. I will resume working on this PR, hopefully next week. |
Hmm, well...one thing we could do perhaps is chicken out and make this a runtime option, via e.g. an environment variable. That would allow people to choose which bugs they want...if other environments aren't hitting the curl bug I was seeing on (then current) Fedora they could opt-in e.g. What may also help is for us to add some sort of giant timeout on top of the entire pull process. |
I re-tested with both:
and
on fedora and don't see the infinite hangs described on: #2843 (comment) The process completes successfully and we see the retries:
Curl version is:
Going to try centos stream now and see if I can replicate the issue. |
I guess that is not seeing the re-tries... It's just the debug output. mmm I am not seeing error: set to anything other than unset... @cgwalters Is there something specific I should be seeing here? |
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
Fixed Upstream in PR2843 and available v2024.5 ostreedev/ostree#2843 Signed-off-by: Jose Quaresma <[email protected]>
This introduces the retry-all-network-errors option which
is enabled by default. This is a behavior change as now
ostree will retry on requests that fail except when
they fail with NOT_FOUND. It also introduces the options
low-speed-limit-bytes and low-speed-time-seconds these
map to CURL options only at the moment. Which have defaults
set following librepo:
https://github.com/rpm-software-management/librepo/blob/7c9af219abd49f8961542b7622fc82cfdaa572e3/librepo/handle.h#L90
https://github.com/rpm-software-management/librepo/blob/7c9af219abd49f8961542b7622fc82cfdaa572e3/librepo/handle.h#L96
Currently this changes only apply when using libcurl.
Closes: #2570