Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] Crash on Windows when dr_get_thread_id() or drwrap_get_retval() called in post-function callback #4068

Closed
jtesta opened this issue Jan 31, 2020 · 7 comments

Comments

@jtesta
Copy link

jtesta commented Jan 31, 2020

Describe the bug
When dr_get_thread_id() or drwrap_get_retval() is called in a post-function callback (i.e.: the function set as the third argument in drwrap_wrap_ex()), a crash occurs.

To Reproduce
Detailed instructions to reproduce can be found here: mxmssh/drltrace#5 (comment)

Note that crashing occurs when using the master branch of DynamoRIO (as of a few days ago).

I used the instructions for both 32-bit and 64-bit Release builds as documented here: https://github.com/mxmssh/drltrace/wiki/How-To-Build

Expected behavior
The dr_get_thread_id() and drwrap_get_retval() function should work as intended instead of crashing.

Screenshots or Pasted Text
For a screen shot, see: mxmssh/drltrace#5 (comment)

Versions

  • I used the master branch of drltrace (https://github.com/mxmssh/drltrace) in conjunction with the master branch of DynamoRIO (as of a few days ago).
  • What operating system version are you running on? I have witnessed this on Windows 7, 8.1, and 10 (if you still need specific build numbers, please let me know).
  • Is your application 32-bit or 64-bit? The crash occurs on both 32-bit and 64-bit builds.
@johnfxgalea
Copy link
Contributor

Thanks for the report. Can you try getting a call-stack via WinDbg?

@derekbruening
Copy link
Contributor

When dr_get_thread_id() or drwrap_get_retval() is called in a post-function callback (i.e.: the function set as the third argument in drwrap_wrap_ex()), a crash occurs.

As you can imagine, drwrap_get_retval() being called in the post-function callback is a pretty basic operation, and naturally we have tests for it. E.g.:
https://github.com/DynamoRIO/dynamorio/blob/master/suite/tests/client-interface/drwrap-test.dll.c#L421
These tests never crash. So there must be more to it than the title of this issue. There must be something unusual about your use case, probably specific to the function being wrapped. Your link mentions wrapping ZwCallbackReturn and KiUserCallbackDispatcher. Those are unusual functions and maybe they could be involved here. Are you sure this issue isn't limited just to functions like those?

The Ki one in particular is never called from user mode, so would drwrap ever call a post-function callback there? There is no "return address" like a regular function. Or does drwrap compute the return address from whatever is on top of the stack there and that bogus address messes things up?

As @johnfxgalea said, a callstack of the crash would help. See the debugging page on our wiki for how to load symbols into windbg.

@derekbruening
Copy link
Contributor

This is not reproducible.
Adding this line to wrap_post in the drwrap-test.dll.c test linked above works fine:

        dr_fprintf(STDERR, "retval=%p, tid=" TIDFMT "\n", drwrap_get_retval(wrapcxt), dr_get_thread_id(drwrap_get_drcontext(wrapcxt)));

Marking as NeedInfo. The ball is in your court: how do you know this crash is inside DR code? I see at least one bug in the code you've linked: dr_get_thread_id returns a 64-bit value on Windows64. Please analyze the actual crash context and callstack, and analyze how to reproduce, as the current instructions do not reproduce any problem.

@mxmssh
Copy link
Contributor

mxmssh commented Feb 1, 2020

Spent some time debugging this issue today. Seems like we are passing NULL as an argument for drwrap_get_retval here which causing NULL-pointer dereference and exception.

DR_EXPORT
void *
drwrap_get_retval(void *wrapcxt_opaque)
{
    drwrap_context_t *wrapcxt = (drwrap_context_t *) wrapcxt_opaque;
    if (wrapcxt->where_am_i != DRWRAP_WHERE_POST_FUNC) -< winDBG stops here
        return false; /* can only get retval in post */
    if (wrapcxt == NULL || wrapcxt->mc == NULL)
        return NULL;
    /* ensure we have the info we need */
    drwrap_get_mcontext_internal(wrapcxt_opaque, DR_MC_INTEGER);
    return (void *) wrapcxt->mc->IF_X86_ELSE(xax, r0);
}

0:001> kn
 # Child-SP          RetAddr           Call Site
00 00000000`8037e8a0 00000000`90005c05 drltracelib!drwrap_get_retval+0x6 [e:\vb\drltrace\drltrace\drltrace_src\dynamorio\ext\drwrap\drwrap.c @ 737]
01 00000000`8037e8d0 00000000`9002d367 drltracelib!std::basic_ios<char,std::char_traits<char> >::init+0x605
02 00000000`8037e910 00000000`9002deaf drltracelib!drwrap_after_callee_func+0x237 [e:\vb\drltrace\drltrace\drltrace_src\dynamorio\ext\drwrap\drwrap.c @ 1965]
03 00000000`8037e9e0 00000000`9002a490 drltracelib!drwrap_event_pre_syscall+0xaf [e:\vb\drltrace\drltrace\drltrace_src\dynamorio\ext\drwrap\drwrap.c @ 2479]
04 00000000`8037eba0 00000000`710535f8 drltracelib!drmgr_presyscall_event+0xc0 [e:\vb\drltrace\drltrace\drltrace_src\dynamorio\ext\drmgr\drmgr.c @ 1474]
05 00000000`8037ee60 00000000`7101ea25 dynamorio!instrument_pre_syscall+0xc8 [e:\vb\drltrace\drltrace\drltrace_src\dynamorio\core\lib\instrument.c @ 2052]
06 00000000`8037eee0 00000000`7101d6e4 dynamorio!handle_system_call+0x95 [e:\vb\drltrace\drltrace\drltrace_src\dynamorio\core\dispatch.c @ 1874]
07 00000000`8037ef20 00000000`7101cd9a dynamorio!dispatch_enter_dynamorio+0x434 [e:\vb\drltrace\drltrace\drltrace_src\dynamorio\core\dispatch.c @ 916]
08 00000000`8037ef50 00000000`800726af dynamorio!dispatch+0x1a [e:\vb\drltrace\drltrace\drltrace_src\dynamorio\core\dispatch.c @ 164]
09 00000000`8037efe0 00000000`00000206 0x800726af
0a 00000000`8037efe8 00000000`00000000 0x206
0:001> dv
 wrapcxt_opaque = 0x00000000`00000000

Obvious fix is to check wrapcxt_opaque != NULL. @derekbruening should it be caller or callee who check for NULL as an argument?

@derekbruening
Copy link
Contributor

The docs for drwrap_wrap explain that NULL is deliberately passed for wrapcxt on abnormal unwind. It is thus up to the user of drwrap to check for it.

@mxmssh
Copy link
Contributor

mxmssh commented Feb 2, 2020

Ok, thank you. I believe we can close this issue.

@mxmssh mxmssh closed this as completed Feb 2, 2020
@derekbruening
Copy link
Contributor

Maybe a debug build check with a useful message about a deliberate NULL could be added to avoid future confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants