Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with the new SSA convention #33

Merged
merged 33 commits into from
Oct 8, 2021

Conversation

fritzalder
Copy link
Member

These are small changes in the linker and the sm_entry and sm_exit stubs to make the linker compatible with the upcoming SSA changes (see sancus-tee/sancus-core#24 )

The linker contains SSA labels for:

  • ssa_base which is also ssa_pc
  • ssa_sp which is at ssa_base-2
  • ssa_end which is at ssa_base-28

Additionally, the linker also has a ssa_thread_id which lies above the SSA at ssa_base+2 for later compatibility when adding actual threading to the SSAs. This can be removed if you find it unclean to add this now.

Also, since we technically only need ssa_base from now on, we could refactor all stubs to recalculate the SP or PC etc from ssa_base on instead of keeping the ssa_sp and ssa_pc labels. But I am not sure whether sm_pc is needed elsewhere so I refrained from renaming it.

src/stubs/sm_entry.s Outdated Show resolved Hide resolved
src/stubs/sm_entry.s Outdated Show resolved Hide resolved
src/stubs/sm_entry.s Outdated Show resolved Hide resolved
src/stubs/sm_entry.s Outdated Show resolved Hide resolved
src/stubs/sm_entry.s Outdated Show resolved Hide resolved
src/stubs/sm_entry.s Show resolved Hide resolved
src/stubs/sm_exit.s Show resolved Hide resolved
@jovanbulck
Copy link
Member

These are small changes in the linker and the sm_entry and sm_exit stubs to make the linker compatible with the upcoming SSA changes (see sancus-tee/sancus-core#24 )

The linker contains SSA labels for:

* ssa_base which is also ssa_pc

* ssa_sp which is at ssa_base-2

* ssa_end which is at ssa_base-28

Additionally, the linker also has a ssa_thread_id which lies above the SSA at ssa_base+2 for later compatibility when adding actual threading to the SSAs. This can be removed if you find it unclean to add this now.

Okay great, we can keep this for now I think. Even when having one SSA and a scheduler we might need this. Also, later we should have a field here for the caller_id.

Also, since we technically only need ssa_base from now on, we could refactor all stubs to recalculate the SP or PC etc from ssa_base on instead of keeping the ssa_sp and ssa_pc labels. But I am not sure whether sm_pc is needed elsewhere so I refrained from renaming it.

Hm, I see you aliased sm_sp now to the SSA location. I think this is not always okay though(!) The current stubs use sm_sp in the entry/exit logic for inter-enclave calls/rets. Be aware that the SSA registers are supposed to be fully transparent and can be overwritten at any moment (outside dint/eint region) by the HW IRQ logic. Any global variables like _sm_sp would have to be stored above the SSA frame, as a thread-local var.

To make this clearer, I suggest to rename things to __sm_ssa_blah to make it clear they're in the SSA frame?

@fritzalder
Copy link
Member Author

Latest changes:

  1. __sm_sp is moved between __sm_ssa_end and __sm_stack_init. This now gives us one static region to store our SP independent of the SSA. The old location of __sm_sp is renamed to __sm_ssa_sp. Changes in sm_entry for this are restricted to:
  • When restoring SP initially, we first use SSA_SP and if it is empty, we use SP.
  • To check whether an IRQ happened, we check SSA_SP
  • When restoring state in reti, we read from and clear SSA_SP
  • All other locations keep using SP
  1. In addition to thread_id I added a caller_id field above the SSA. caller_id is filled after the check for an IRQ call and is only filled if it is empty (on the first call). Only when returning, caller_id is emptied. This means that during any intermediate call into the SM, the caller_id remains static until it returns.

  2. Linker can now take a YAML configuration file. An example is in the drivers subdirectory where linker.py lies.

  3. Added an alternative default sm_isr_basic stub that does not perform any stack pushes before entering the ISR. Can be enabled through the new YAML configuration file.

  4. and 4) are technically not part of this PR but are reasonably the next step that we wanted to merge into master. We can split this off if you prefer.

Copy link
Member

@jovanbulck jovanbulck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review of compiler

src/sancus_support/sm_support.h Outdated Show resolved Hide resolved
src/stubs/sm_entry.s Show resolved Hide resolved
jovanbulck added a commit to sancus-tee/sancus-examples that referenced this pull request Sep 27, 2021
This anticipates expected changes to the behavior of the
sancus_get_caller_id() runtime function to be merged in
sancus-tee/sancus-compiler#33
@jovanbulck
Copy link
Member

I added some notes on caller_id behavior changes above.

Thinking further about it, I think you do need to implement this with a compiler intrinsic, at least partially. Otherwise you'd have to pass the name of the calling SM all the time to rewrite the symbol to the function that will be inserted by the linker.

Maybe the cleanest/most maintainable is a hybrid approach:

  1. use a compiler intrinsic as in the code above to intercept all calls to sancus_get_caller_id().
  2. in the compiler pass, rewrite these calls to a unique symbol name like __sm_NAME_get_caller_id() or __unprotected_get_caller_id() (the latter may also be the direct .word machine instruction, depending on the behavior we want for unprotected code)
  3. in the linker, rename the generic implementation of __sm_get_caller_id() to the unique __sm_NAME_get_caller_id()

src/stubs/sm_exit.s Outdated Show resolved Hide resolved
@fritzalder
Copy link
Member Author

I added a commit that implements the compiler intrinsic for get_caller_id. This is my first time working with Twine so I am not sure that this is indeed correct. Especially the $0 in the assembly that I added are added without any basis, they just looked right after looking at the other code. This works on my machine but I will need to revisit this on Monday to see whether the generated assembly code really makes sense.

@fritzalder
Copy link
Member Author

I think this PR looks pretty good now. We can consider to also put #22 in here but there is always more to do and we can just do that in a new PR. So I'd vote to merge this sooner than later and then address the remaining issues in other PRs and also get #36 along.

@jovanbulck
Copy link
Member

I'd like #13 to be still fixed before merging, because it directly affects the availability guarantees of Aion and should be fixable now that we're already rewriting the entry/exit stubs thoroughly.

Best to have this all merged at once, so we can update the CI tests (and nemesis/sancus-step calibration etc) as needed and hopefully have everything working nicely together :)

@fritzalder
Copy link
Member Author

#13 is now addressed in the newest commits.

  1. I had to revert the SancusModuleCreator to still do the mov operation after caller_id since the compiler does use r12, not r15 for returns. This then gave an error in the hello-world example as it expected the caller_id in r12.
  2. The earlier added check for r6=0xffff is not enough as it allows the attacker to use this actively to overwrite the ssa_caller_id field. I now explicitly also check for ocall_id. I argue that checking for 0xffff in r6 is still useful to make the reentry explicit, but arguably we do not need it anymore now that we check for ocall_id. I would leave it in.
  3. We have some code duplication now for the double checking of OCALL reentry. But I did not see a nice way to unify these 10 lines and anything that I came up with was more than 10 lines.

@jovanbulck
Copy link
Member

the current design with a single ocall_id doesn't work when considering nested ecalls (where an SM can call back into the caller SM which may itself call other SMs again, which is possible in C function calls and with the current stubs). If we want to support that we prob need to save them on the stack.

The bigger issue is that we cannot guarantee thread isolation in this case:

eg consider 2 mutually distrusting threads: T1 with enclaves A-B and T2 with enclave C

T1:A -ocall-> B ; IRQ, scheduler-ecall-> T2:C -ecall-> A; IRQ, scheduler-reti-> T1:B-oret->A

At this point A has context for both T1 and T2 on the same call stack, plus an SSA frame for T2 (!)

I don't see an easy solution for this (apart from maintaining a threadID as I did in my earlier availability prototype, to maintain the invariant that an enclave allocates max 1 SSA/stack per thread) or disallowing nested ecalls altogether (ie a type of restricted call graph without backarrows, which puts some burden on the programmer, but can be enforced in the stub I think)

@fritzalder
Copy link
Member Author

I think you're mixing up nested ecalls with multiple SSAs here.

Nested ECALLs: A -> B -> C. This should work with the current solution.
Making multiple calls into the same enclave: This requires more than one SSA frame which we currently do not support. Thus, the call from C->A above would return an error since the SSA frame of A is already in use. (Also why would C call A if they are distrusting? :D ).

We decided earlier that our solution for now only supports a single SSA frame. If we want to change this, that would be a whole bunch of other changes but even then, the ocall_id would still be on the SSA frame and would not need changing. If you really want, we can address this in future PRs but I don't see the benefit that much in adding it right now

@jovanbulck
Copy link
Member

wait now I'm confused myself 🤔

does one SSA mean that nested ecalls are not allowed? where is this enforced in the stub?

@jovanbulck
Copy link
Member

jovanbulck commented Oct 7, 2021

I tried to code up my proposal to properly enforce this (and not allow backcalls) as per 1dd1beb

but I'd be happy to revert this if it turns out not needed?

@jovanbulck
Copy link
Member

Nested ECALLs: A -> B -> C. This should work with the current solution. Making multiple calls into the same enclave: This requires more than one SSA frame which we currently do not support. Thus, the call from C->A above would return an error since the SSA frame of A is already in use. (Also why would C call A if they are distrusting? :D ).

I agree to properly support this we'd have to use multiple SSAs. My point is that afaik the call from C->A should not be allowed, but will not return an error without my modifications in the latest commit above. In this case, C is the attacker and will call into A to try to break availability of thread T1.

To properly support backcalls, one would need a notion of trusted threadIDs, even with a single SSA per SM. Note that this can be totally legitimate use case as well, eg A calls B and B calls back into A and A returns to B and B finally retuns to A, right?

This simplifies the entry stub considerably and enables
thread isolation guarantees for Aion.
@jovanbulck
Copy link
Member

good point, nice catch! 👍

That can be fixed though I think by sanitizing r6 on entry, eg a simple fix would be to mask the highest bit of r6 on entry (and #0x7FFF, r6) I think 🤔 That way, even if the attacker passes 0xffff, the only way r6 is 0xffff after accept_ecall is if there's a proper oret afais?

Do you agree that would be a way forward to fix this issue and finalize the merge?

@jovanbulck
Copy link
Member

hm good idea, I think this could work on aion-cores, but then we again have issue on non-aion sancus-cores right?

So, unless there's a security counterargument, I would simply sanitize r6 on entry

@jovanbulck
Copy link
Member

no .Lerror is a local label

and error is a stdlib function outside the enclave (hence my comments regarding leaking state and rewriting .Lerror to an infinite loop, not the cleanest but secure as opposed to what we had previously)

I'm now preparing a commit proposal fix

Since now we use a dedicated SSA region, instead of the stack, to store
interrupted execution context, we don't need to setup the stack within
the atomic region anymore.

Also note that the atomic entry section will already take care of
restoring the SSA (including sm_ssa_sp) if the enclave was interrupted.
@jovanbulck
Copy link
Member

@fritzalder the 2 above commits could use a careful review to make sure I didn't screw anything up? xD

  • fa95a6c should block the attack you described above
  • efb3b98 should further simplify the code by moving stack initialization outside the atomic section (as we now use the SSA instead of the stack for storing registers on IRQ)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SM entry/exit atomicity Sanity-check untrusted arguments in trusted sm_entry/exit stubs
2 participants