-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug flakes #1324
Debug flakes #1324
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
23a4151
to
b29afab
Compare
This comment was marked as resolved.
This comment was marked as resolved.
67b1411
to
37092ff
Compare
This comment was marked as resolved.
This comment was marked as resolved.
336a614
to
6d68819
Compare
This comment was marked as resolved.
This comment was marked as resolved.
4dec5bb
to
0607a90
Compare
This comment was marked as resolved.
This comment was marked as resolved.
09e6d96
to
5a28f0d
Compare
This comment was marked as resolved.
This comment was marked as resolved.
4195b7b
to
66516e1
Compare
30276aa
to
a1528f3
Compare
This comment was marked as resolved.
This comment was marked as resolved.
I've seen TestApplication.testHealthcheckUser fail twice like this by now. example 2, example 3 I can reproduce it locally after a few iterations. This bit is interesting:
That does seem related! It seems that previously it only worked by chance if there was another event after triggering a health check run. However, the flake still happens when adding
Enabled debug logging here. variant 1 misses the transition from "checking health" to "healthy". The log shows that there is just a single monitor event for the starting health check:
and the corresponding inspect call also says
I reported this as containers/podman#19237, and we have a previously implicit, and now explicit workaround, see commit 92398c3. variant 2 misses the new healthcheck result after running the "health check" container action. This is missing a "State": {
"OciVersion": "1.1.0-rc.1",
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 15520,
"ConmonPid": 15518,
"ExitCode": 0,
"Error": "",
"StartedAt": "2023-07-14T10:07:14.064090116Z",
"FinishedAt": "0001-01-01T00:00:00Z",
"Health": {
"Status": "healthy",
"FailingStreak": 0,
"Log": [
{
"Start": "2023-07-14T10:07:14.151866216Z",
"End": "2023-07-14T10:07:14.261819433Z",
"ExitCode": 0,
"Output": ""
}
]
}, so that'll be another naughty. Sent as cockpit-project/bots#5004 |
512f1ad
to
298112d
Compare
d3cb07d
to
0d0a164
Compare
These events change the state in a predictable way. Avoid the expensive updateContainer() call for these, to avoid multiple calls overlapping each other. See containers/podman#19124 The exact `StartedAt` value for the "start" event is unfortunately not part of the event data, but we can approximate it very well with the event time stamp. This doesn't have to be 100% correct: The only place where we use that is the restart detection in testLifecycleOperations. Adjust that to not require the data-started-at to be identical to the CLI output, just that it changes.
This reverts commit d3cb07d. Let's debug healthcheck first
For some reason, .focus() sometimes does not take effect, and the element immediately loses the focus again. This causes set_input_text() to select the whole page text instead of the input element, and failing to set the intended value.
Shelving this for now. I'll make another attempt at reducing API queries at some point, but these days tests are remarkably stable. |
I got this testCreatePodUser error locally once when I was investigating something else. But of course since then I could never reproduce it again. The pod container is in state "Created", while it's supposed to be "Running". That isn't just a timeout question, it stays in "Created" forever.
another example