-
-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assert() causes OSv to hang in abort if relevant part of the app ELF not populated #1237
Comments
Can abort itself turn off interrupts only after formatting the string? |
Possibly we can do that. I wonder what the consequence of it would be. I found the commit that added the line to disable interrupts along with printing the backtrace - d022927. The motivation was to disable the context switch. Printing backtrace may also trigger page faults. Ideally, we should delay disabling the interrupts until after printing the backtrace. I think the concern is that the abort in general may be called when the system is not stable enough to handle page faults at this time. But maybe this is an acceptable risk. We do want to get as much information as possible about the original condition that triggered the abort on the other hand. |
I found the discussion on the mailing list about this commit - https://groups.google.com/g/osv-dev/c/lntd5UBlMvg/m/AZRs5MHsQCcJ. There was this concern about making the backtrace printing not use malloc(): "When I debugging the patch, I saw the situation that even my print_backtrace() code doesn't call malloc(), another threads calling malloc(). And I started worrying another running thread breaks system before print_backtrace() finishes print out backtrace. But I wonder why this was a concern. Disabling the interrupts prevents the context switch only on the current cpu, so threads on other CPUs could still call malloc. And why calling malloc by other threads would be bad. So what would be the consequence of moving the |
When writing and trying a new test
tst-brk.cc
I discovered that OSv hangs instead of printing normally in this caseAbortion failed ...
and the corresponding backtrace. After connecting with gdb I would see following stack trace:After a bit of analysis, I concluded that the corresponding part of the text segment of the application ELF where
expr
orfile
orfunc
is located most likely has not been populated yet when this code was being executed:The
abort()
called down the stream turns off the interrupts but when theprintf
tries to read that text from memory it triggers the page fault that causes nested abort (interrupts cannot be disabled when handling the faults):I am not 100% sure what the right solution is but pre-fetching this text data in
__assert_fail
before calling abort seems like one way to solve this problem:But may be there is a better solution to this problem.
The text was updated successfully, but these errors were encountered: