Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: (null): .debug_loc+0x19e829: unknown DWARF expression opcode 0xf3 #233

Closed
brenns10 opened this issue Jan 9, 2023 · 9 comments
Labels
debuginfo Support for debugging information formats

Comments

@brenns10
Copy link
Contributor

brenns10 commented Jan 9, 2023

When retrieving variable values from some kernel stack frames, drgn fails unknown DWARF expression opcode 0xf3. I created a reproducer test (which includes my stack trace helper, heh) to run within the vmtest, and it reliably reproduces on kernels 5.10+, probably due to a change to the DWARF generation / compiler version in that release? In my exploration this happens despite drgn using the libdw packaged in the wheel, which is version 188, the latest. So I'm not certain what the issue is.

Edit: looks like DW_OP_GNU_entry_value is the opcode for value 0xf3.
Edit 2: here's the github actions run for this error: https://github.com/brenns10/drgn/actions/runs/3878425687/jobs/6614578560

@ghost
Copy link

ghost commented Jan 10, 2023

In my exploration this happens despite drgn using the libdw packaged in the wheel, which is version 188, the latest.

Just chiming in to say that I wouldn't expect the libdw version to have any bearing here, as the error comes from drgn's own DWARF parser, which explicitly does not yet have support for DW_OP_(GNU)_entry_value.

If this is presently preventing you from using drgn, as a temporary workaround (assuming you're compiling the kernel yourself) you could try adding -gdwarf-4 -gstrict-dwarf to CFLAGS which should prevent DW_OP_GNU_entry_value from being emitted.

@brenns10
Copy link
Contributor Author

Thanks! I had gotten to that part of the code, and since I know pretty much nothing about DWARF I figured I'd pause and wait for a second opinion.

I originally encountered this with Oracle's UEK7 kernel (5.15 based) -- maybe we can update the compiler flags but I'd imagine that would be a hard sell. On the other hand, this isn't currently blocking anything much. I encountered it during normal continuous integration tests and for now, I've marked the tests as xfail.

If I find free time I might try to learn more about what's going on here and see if I can help with adding support. I'll have to see what that entails though :)

@osandov
Copy link
Owner

osandov commented Jan 11, 2023

Yeah, when I wrote the DWARF unwinder, I left those unimplemented so I could see if I ran into them in practice. I have, and now you have, too, sorry! This wasn't a big deal for me since I knew how to interpret it, but for what I see in your work-in-progress, it'd definitely be better to gracefully handle this rather than blow up.

DW_{,GNU_}OP_entry_value means "the value of the given location upon entry into this function". The main use case is for parameters that were passed in caller-saved registers when that register gets clobbered in the middle of a function. The parameter was essentially optimized out, but if the debugger saves the values of the registers upon entry into the function, then it can still recover the parameter.

Unfortunately, we don't (and can't) do that. So for us, I think we just have to treat these as optimized out. Something like:

diff --git a/libdrgn/dwarf_info.c b/libdrgn/dwarf_info.c
index 2350c1b1..58beb7eb 100644
--- a/libdrgn/dwarf_info.c
+++ b/libdrgn/dwarf_info.c
@@ -4436,6 +4436,9 @@ branch:
 		/* Special operations. */
 		case DW_OP_nop:
 			break;
+		case DW_OP_entry_value:
+		case DW_OP_GNU_entry_value:
+			return &drgn_not_found;
 		/* Location description operations. */
 		case DW_OP_reg0 ... DW_OP_reg31:
 		case DW_OP_regx:
@@ -4451,7 +4454,6 @@ branch:
 		 *
 		 * - DW_OP_push_object_address
 		 * - DW_OP_form_tls_address
-		 * - DW_OP_entry_value
 		 *   DW_OP_implicit_pointer
 		 * - Procedure calls: DW_OP_call2, DW_OP_call4, DW_OP_call_ref.
 		 * - Typed operations: DW_OP_const_type, DW_OP_regval_type,

@osandov
Copy link
Owner

osandov commented Jan 11, 2023

The (null) in the exception seems busted, though. That should be the path of the offending file.

@brenns10
Copy link
Contributor Author

Oh wow, that explanation was worth a few dozen pages of DWARF5 spec at least! Thanks. I'll try out the patch tomorrow and see how it works.

@osandov
Copy link
Owner

osandov commented Jan 12, 2023

As a follow up, I compiled a test program with -g -gdwarf-4 and -g -gdwarf-4 -gstrict-dwarf to compare the output (thanks @Svetlitski-FB for providing the flags to test this with).

The function in question is https://github.com/osandov/osandov-linux/blob/801ae515d22d689265a6940d8ef4aa9d99b81bf7/scripts/debuginfod_client.c#L14-L28, which compiles to

00000000004014a0 <usage>:
  4014a0:       53                      push   %rbx
  4014a1:       40 84 ff                test   %dil,%dil
  4014a4:       89 fb                   mov    %edi,%ebx
  4014a6:       48 8b 15 93 2c 00 00    mov    0x2c93(%rip),%rdx        # 404140 <progname>
  4014ad:       48 8b 3d ec 2c 00 00    mov    0x2cec(%rip),%rdi        # 4041a0 <stderr@GLIBC_2.2.5>
  4014b4:       48 0f 44 3d a4 2c 00    cmove  0x2ca4(%rip),%rdi        # 404160 <stdout@GLIBC_2.2.5>
  4014bb:       00
  4014bc:       31 c0                   xor    %eax,%eax
  4014be:       be 10 20 40 00          mov    $0x402010,%esi
  4014c3:       e8 e8 fb ff ff          call   4010b0 <fprintf@plt>
  4014c8:       0f b6 fb                movzbl %bl,%edi
  4014cb:       e8 c0 fb ff ff          call   401090 <exit@plt>

error is passed in %rdi and then clobbered at 4014ad. The debug info without -gstrict-dwarf uses DW_OP_GNU_entry_value starting from 4014b4 to represent this:

 [   448] range 4014a0, 4014b4
          0x00000000004014a0 <usage>..
          0x00000000004014b3 <usage+0x13>
           [ 0] reg5
          range 4014b4, 4014d0
          0x00000000004014b4 <usage+0x14>..
          0x00000000004014cf <usage+0x2f>
           [ 0] GNU_entry_value:
                [ 0] reg5
           [ 3] stack_value

But with -gstrict-dwarf, it doesn't have a location starting from 4014b4, meaning the variable is optimized out:

 [   3be] range 4014a0, 4014b4
          0x00000000004014a0 <usage>..
          0x00000000004014b3 <usage+0x13>
           [ 0] reg5

So I'm more confident now that my patch is the correct thing to do (at least its intention, since I haven't tested that it does what it claims to do 😉)

@brenns10
Copy link
Contributor Author

After a bit of confusion with the vmtest, I did go ahead and test this patch on my branch and see that it works great. For my part I'll probably catch this exception and regex match it, since the next version won't be around for a while, it's no big deal.

Let me know if you need more testing help or anything else on this. I consider it resolved, but I suppose you'll want to keep it open until there's a proper fix for these opcodes.

@brenns10
Copy link
Contributor Author

brenns10 commented Jan 21, 2023

Strangely enough, I just encountered "unknown DWARF expression opcode 0xa3" corresponding to the non-GNU entry_value opcode, on a crash dump on my arch desktop.

@osandov osandov changed the title Exception: (null): .debug_loc+0x19e829: unknown DWARF expression opcode 0xf3 Support DW_OP_entry_value/DW_OP_GNU_entry_value Jul 3, 2023
@osandov osandov added the debuginfo Support for debugging information formats label Jul 5, 2023
@osandov osandov changed the title Support DW_OP_entry_value/DW_OP_GNU_entry_value Exception: (null): .debug_loc+0x19e829: unknown DWARF expression opcode 0xf3 Jul 7, 2023
@osandov
Copy link
Owner

osandov commented Jul 7, 2023

I forgot to follow up on this, but it turns out that there are cases where we can recover something from DW_OP_(GNU_)entry_value by looking at DW_TAG_(GNU_)call_site and DW_TAG_(GNU_)call_parameter information. I previously hijacked this issue to be for properly supporting this, but I now opened #337 for that and will keep this for the workaround of treating them as optimized out.

@osandov osandov closed this as completed in c76f25b Jul 7, 2023
osandov added a commit that referenced this issue Jul 18, 2023
We're getting (null) file paths in error messages (e.g., #233) because
libdwfl doesn't always return the debug file path. Fall back to the
loaded file path, which is better than nothing until we get rid of
libdwfl.

Signed-off-by: Omar Sandoval <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debuginfo Support for debugging information formats
Projects
None yet
Development

No branches or pull requests

2 participants