-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thread: Terminating threads go to zombie state instead of stopped #19197
Draft
chrysn
wants to merge
2
commits into
RIOT-OS:master
Choose a base branch
from
chrysn-pull-requests:thread-cleanup
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
chrysn
added
Type: enhancement
The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Area: core
Area: RIOT kernel. Handle PRs marked with this with care!
Process: API change
Integration Process: PR contains or issue proposes an API change. Should be handled with care.
CI: ready for build
If set, CI server will compile all applications for all available boards for the labeled PR
Process: needs >1 ACK
Integration Process: This PR requires more than one ACK
labels
Jan 25, 2023
github-actions
bot
added
the
Process: missing approvals
Integration Process: PR needs more ACKS (handled by action)
label
Jan 25, 2023
chrysn
added
CI: run tests
If set, CI server will run tests on hardware for the labeled PR
CI: ready for build
If set, CI server will compile all applications for all available boards for the labeled PR
and removed
CI: ready for build
If set, CI server will compile all applications for all available boards for the labeled PR
labels
Jan 25, 2023
bors bot
added a commit
that referenced
this pull request
Jan 27, 2023
19199: sys/suit: Ensure previous thread is stopped before reusing its stack r=benpicco a=chrysn ### Contribution description Closes: #19195 If the thread has released the mutex but the thread has not terminated (which happens in the situation that would previously have overwritten a still active thread's state), then a warning is shown and the trigger is ignored. ### Testing procedure This should work before and after: * `make -C examples/suit_update BOARD=native all term` * `aiocoap-client coap://'[fe80::3c63:beff:fe85:ca96%tapbr0]'/suit/trigger -m POST --payload 'coap://[2001:db8::]/foo'` * In parallel, on the RIOT shell, run `suit fetch coap://[2001:db8::]/foo` * After the first download fails, the second one starts right away ("suit_worker: update failed, hdr invalid" / "suit_worker: started"). Run again with the worker thread on low priority: ```patch diff --git a/sys/suit/transport/worker.c b/sys/suit/transport/worker.c index a54022fb28..e26701a64c 100644 --- a/sys/suit/transport/worker.c +++ b/sys/suit/transport/worker.c `@@` -70 +70 `@@` -#define SUIT_COAP_WORKER_PRIO THREAD_PRIORITY_MAIN - 1 +#define SUIT_COAP_WORKER_PRIO THREAD_PRIORITY_MAIN + 1 ``` Before, this runs the download once silently (no clue why it doesn't run it twice, but then again, I claim there's concurrent memory access from two thread, so who knows what happens). After, it runs a single download and shows an error message for the second one once the first is complete ("Ignoring SUIT trigger: worker is still busy."). ### Issues/PRs references This may be made incrementally easier by #19197 -- that PR as it is now would spare us the zombification (because returning would do that), and having a `wait` function would allow us to turn the new error case into a success. 19205: boards/common: add common timer config for GD32VF103 boards r=benpicco a=benpicco 19207: examples/gnrc_border_router: static: use router from advertisements by default r=benpicco a=benpicco Co-authored-by: chrysn <[email protected]> Co-authored-by: Benjamin Valentin <[email protected]> Co-authored-by: Benjamin Valentin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Area: core
Area: RIOT kernel. Handle PRs marked with this with care!
CI: ready for build
If set, CI server will compile all applications for all available boards for the labeled PR
CI: run tests
If set, CI server will run tests on hardware for the labeled PR
Process: API change
Integration Process: PR contains or issue proposes an API change. Should be handled with care.
Process: missing approvals
Integration Process: PR needs more ACKS (handled by action)
Process: needs >1 ACK
Integration Process: This PR requires more than one ACK
Type: enhancement
The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Contribution description
Let's categorize threads in general:
Category 3 is currently hard to do -- I know of one example (the SUIT worker), and that got it subtly wrong.
I think that it'll be easier to get category 3 right if threads are not completely stopped after termination (something that makes their PID free for reuse, which means that once you hold a PID you'll never know if the thread maybe just terminated and anything you do with that PID now goes to a different process). Instead, this PR makes the thread into a zombie when it dies.
After this PR, it's up to the creator of the process to stop the zombie thread, if its resources are to be reused. Cleanup doesn't need to be done by the same task that created it -- could be anyone. Threads of category 1 and 2 can easily just stay zombies without clean-up: There was enough space in the PID range to accommodate them when they were live, so there's still space after. Category 3 tasks already needed to do some cleanup, now they have better tools for it.
Possible extensions (in this PR or a follow-up)
Documentation (really needs to go here)
Fixing the SUIT case
Storing the thread's return value somewhere in the zombie state (
thread_zombify_with_value
? Maybe unioned with the SP?)Adding a way to wait for a thread when it's not clear yet that the thread has terminated. (It's generally hard to do that from a higher priority thread: Imagine you used a message or mutex to signal that the dying process is done, right before returning. Then the waiting thread would still be awoken right before the dying thread had a chance to go to zombie mode. One implementation solution would be for the waiting call to lift the awaited process's priority to the caller and yield; still, this can be abused and badly block the system).
I think that having this is not critical for a MVP version of this PR -- after all, creating the thread anew will "just" fail, and the caller is invited to try again.
Testing procedure
TBD. Right now, this only contains the change itself to see what the fall-out will be.
Issues/PRs references
This is one way to close #19195.
There was previous discussion of this topic in #17542 -- back when I thought the underlying problem could be resolved more easily.
This is similar to how POSIX systems handle terminated threads. On those systems, there's a clear parent process that's responsible for clean-up -- here we don't have that, but we don't need it: Any process can do the clean-up, as long as it's only one process. (Similar to how we unlock mutexes).