Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrbitService crashing on redhat kernels due to changed layouts #4881

Open
mteske opened this issue Dec 13, 2024 · 2 comments
Open

OrbitService crashing on redhat kernels due to changed layouts #4881

mteske opened this issue Dec 13, 2024 · 2 comments

Comments

@mteske
Copy link

mteske commented Dec 13, 2024

Hi,

after a collegue showed orbit to me I wanted to build it for rocky linux 9. I got it compiled and running (after a few quirks but that's a different story), but OrbitServer crashed. I did a debug build and looked at the core, it crashes her:
#2 0x00007f46c2a28833 in abort () from /usr/lib64/libc.so.6
#3 0x0000559abbfc2231 in orbit_linux_tracing::TracerImpl::ProcessSampleEventAndReturnTimestamp (this=0x7f46b00896f0, header=..., ring_buffer=0x7f469800b110)
at /home/michael.teske/src/orbit-main/src/LinuxTracing/TracerImpl.cpp:1353
#4 0x0000559abbfbeda2 in orbit_linux_tracing::TracerImpl::ProcessOneRecord (this=0x7f46b00896f0, ring_buffer=0x7f469800b110) at /home/michael.teske/src/orbit-main/src/LinuxTracing/TracerImpl.cpp:967
...
1353 ORBIT_CHECK(header.size == sizeof(RingBufferRawSample));
(gdb) p header.size
$1 = 120
(gdb) p sizeof(RingBufferRawSample)
$2 = 112

So I looked into /sys/kernel/debug/tracing/events/sched/sched_switch/format and compared with my colleague's version and it looks like they added a byte
field:unsigned char common_preempt_lazy_count; offset:8;

resulting in this structure being 8 bytes longer (and probably others as well). Of course it does not match the structs in
KernelTracepoints.h anymore. I'll try to fix that, but I'm wondering if anybody else had the problem. My kernel reports as
5.14.0-503.15.1.el9_5.x86_64 .
I didn't find much information about the change, except
https://lore.kernel.org/bpf/[email protected]/t/
Is this a redhat thing maybe?
EDIT: Yes, it's redhat. my collegue has Ubuntu with kernel 6.8.

@mteske mteske changed the title OrbitService crashing on newer kernels due to changed layouts OrbitService crashing on redhat kernels due to changed layouts Dec 13, 2024
@mteske
Copy link
Author

mteske commented Dec 13, 2024

KernelTracepoints-diff.txt
In the meantime I got it to run with these diffs, where I found the correct padding with a bit of trial and error. This will break non-redhat systems of course, I'm not sure how to do a proper patch right now. Once the actual format files are used this should not be a problem anymore. In the meantime it might be useful for any rhel9-user (rhel8 does not have this change).

@pierricgimmig
Copy link
Collaborator

Hi @mteske , thanks for the report. This is indeed an issue with the current implementation and is already tracked here: #4857. Ideally we parse the tracepoint layout at runtime instead of hard-coding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants