-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for ppc64le boot mirroring tests #3611
Conversation
These should now be fixed by coreos/coreos-assembler#3611.
These should now be fixed by coreos/coreos-assembler#3611.
@@ -100,6 +100,11 @@ func runBootMirrorTest(c cluster.TestCluster) { | |||
MinMemory: 4096, | |||
}, | |||
} | |||
// ppc64le and aarch64 use 64K pages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. does aarch64 really use 64K pages? We recently had a 64K page bug and it really only affected ppc64le.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. For reference, I cargo-culted this block from other places in the code for the ppc64le bit but reading the comment it's true that its mention of aarch64 using 64K pages is incorrect (hence why we're doing things like adding support for a separate aarch64 kernel with the feature).
It looks like the first comment mentioning this was introduced in #1955. The comment may not be accurate but it seems like Prashanth had to bump the memory on aarch64 too to get tests to pass.
For now, I've cross-referenced all these occurrences, but let's look into ripping them out again to see which ones are load-bearing and only keep those (with an up-to-date rationale).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aarch64 used 64k pages by default with RHEL 8. RHEL 9 defaulted to 4K pages, that's why we're introducing #3903
Maybe that's why it was there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not super comfortable with the comment as is because it can be misleading. i.e. someone reads the comment and accepts it as fact that aarch64 uses 64K pages.. how about we add a (sometimes)
after aarch64
:
// ppc64le and aarch64 (sometimes) use 64K pages
Maybe we even add the (sometimes)
in a second commit so the commit message can have more context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aarch64 used 64k pages by default with RHEL 8. RHEL 9 defaulted to 4K pages, that's why we're introducing #3903 Maybe that's why it was there ?
Nice find! Yes, it all makes sense now.
Maybe we even add the (sometimes) in a second commit so the commit message can have more context.
Hmm, I think we should actually drop the aarch64 special-casing entirely. I'm working on a patch right now and will test it. Assuming nothing breaks, I'll push it as another commit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I ended up splitting this out into a separate PR to not waste CI here: #3617.
On ppc64le and s390x, the Ignition config is passed as a block device. In the `RemovePrimaryBlockDevice()` code, we need to make sure we don't inadvertently select the `ignition` block device as the new boot device. More generally, the reliance on the `virtio-backend` string is weak. Instead, use a more specific device name for the real block devices when preparing the QEMU arguments, and filter on that. Partially fixes: coreos#2725 Partially fixes: coreos#3360
All our other root reprovisioning tests double the memory request on ppc64le and aarch64 due to the larger page size. Do this for the boot mirroring tests too and increase the memory request to 8G. With the current 4G, the tests would sometimes panic during the reboot right after the primary block device detach. Even with 8G, the panic still happens, albeit much more rarely. Rather than bumping the memory even more, I've found that sleeping a bit before rebooting does the trick. Partially fixes: coreos#2725 Partially fixes: coreos#3360
c31d0d4
to
b32fbf2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I reverted the tag in quay for COSA that included this PR. We're seeing failures in multi-arch builds like:
We'll need to investigate in the morning. |
Fix in #3620. |
These should now be fixed by coreos/coreos-assembler#3611.
Follow-up to coreos#3611. We're still noticing occasional kernel panics with 30s on ppc64le. Bump it to 60s to see if that helps as a last ditch effort before re- disabling these tests until someone has time to dig into it properly. While we're here, make it specific to ppc64le.
Follow-up to #3611. We're still noticing occasional kernel panics with 30s on ppc64le. Bump it to 60s to see if that helps as a last ditch effort before re- disabling these tests until someone has time to dig into it properly. While we're here, make it specific to ppc64le.
These should now be fixed by coreos/coreos-assembler#3611.
These should now be fixed by coreos/coreos-assembler#3611.
These two commits together should fix both #2725 and #3360.