Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/suit: Ensure previous thread is stopped before reusing its stack #19199

Merged
merged 1 commit into from
Jan 27, 2023

Conversation

chrysn
Copy link
Member

@chrysn chrysn commented Jan 25, 2023

Contribution description

Closes: #19195

If the thread has released the mutex but the thread has not terminated (which happens in the situation that would previously have overwritten a still active thread's state), then a warning is shown and the trigger is ignored.

Testing procedure

This should work before and after:

  • make -C examples/suit_update BOARD=native all term
  • aiocoap-client coap://'[fe80::3c63:beff:fe85:ca96%tapbr0]'/suit/trigger -m POST --payload 'coap://[2001:db8::]/foo'
  • In parallel, on the RIOT shell, run suit fetch coap://[2001:db8::]/foo
  • After the first download fails, the second one starts right away ("suit_worker: update failed, hdr invalid" / "suit_worker: started").

Run again with the worker thread on low priority:

diff --git a/sys/suit/transport/worker.c b/sys/suit/transport/worker.c
index a54022fb28..e26701a64c 100644
--- a/sys/suit/transport/worker.c
+++ b/sys/suit/transport/worker.c
@@ -70 +70 @@
-#define SUIT_COAP_WORKER_PRIO THREAD_PRIORITY_MAIN - 1
+#define SUIT_COAP_WORKER_PRIO THREAD_PRIORITY_MAIN + 1

Before, this runs the download once silently (no clue why it doesn't run it twice, but then again, I claim there's concurrent memory access from two thread, so who knows what happens). After, it runs a single download and shows an error message for the second one once the first is complete ("Ignoring SUIT trigger: worker is still busy.").

Issues/PRs references

This may be made incrementally easier by #19197 -- that PR as it is now would spare us the zombification (because returning would do that), and having a wait function would allow us to turn the new error case into a success.

@chrysn chrysn added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Area: sys Area: System labels Jan 25, 2023
@chrysn chrysn requested a review from benpicco January 25, 2023 17:14
@github-actions github-actions bot added the Area: OTA Area: Over-the-air updates label Jan 25, 2023
@benpicco
Copy link
Contributor

I wonder a bit:
We have XFA auto-init now, so always starting the SUIT thread would be trivial.
The original idea of this approach was to simplify the code, if we have to introduce such bespoke zombiefy patterns now, it might be better to just always spawn the thread and have it wait on a mutex that gets unlocked by suit_worker_trigger().

@chrysn
Copy link
Member Author

chrysn commented Jan 25, 2023

That too would be a completely fine solution. There'd probably be a mutex and some other mechanism (a second mutex, an thread flag, whatever), but yes, chances are it's simpler.

It may make me come back to matrix and ask "are there now any valid use cases for a thread terminating and restarting?", but that's not really a concern for this concrete issue.

@riot-ci
Copy link

riot-ci commented Jan 25, 2023

Murdock results

✔️ PASSED

2c716d3 sys/suit: Ensure previous thread is stopped before reusing its stack

Success Failures Total Runtime
6796 0 6796 07m:40s

Artifacts

@benpicco
Copy link
Contributor

Thinking about it, one reason to do it that way was that if you don't call suit_worker_trigger() (because you are directly calling suit_handle_url() in a thread with sufficient stack), all the custom suit thread thing should be thrown away by the compiler.

The suit thread is only needed because the suit_flashwrite backed requires to keep 1 flash page on the stack, which can be up to 8k depending on the platform.

So if you already have a thread with sufficient stack you would call suit_handle_url() and the linker should throw away that extra thread stack only used by suit_worker_trigger().

@benpicco
Copy link
Contributor

bors merge

@bors
Copy link
Contributor

bors bot commented Jan 27, 2023

Build succeeded:

@bors bors bot merged commit c6c84cc into RIOT-OS:master Jan 27, 2023
@chrysn chrysn deleted the fix-19195-suit-threads branch February 1, 2023 13:40
@MrKevinWeiss MrKevinWeiss added this to the Release 2023.04 milestone Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: OTA Area: Over-the-air updates Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in SUIT around threads terminating
4 participants