-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deterministic tagging for stack allocations #13
base: main
Are you sure you want to change the base?
Conversation
cranelift/wasm/src/state.rs
Outdated
} | ||
Some(tagged_index) => { | ||
// Extract random tag | ||
let latest_tag = builder.ins().band_imm(tagged_index, MTE_TAG_BITS_MASK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line currently panics on band_imm
, if you follow the execution path further you will see that value_type
gets called and the array access there panics because the Value
doesn't exist. So I guess saving to an Option
in the state might not be the correct way of saving values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the code without the variable declaration and it worked fine for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. What input C code did you use for testing?
cranelift/wasm/src/state.rs
Outdated
} | ||
Some(tagged_index) => { | ||
// Extract random tag | ||
let latest_tag = builder.ins().band_imm(tagged_index, MTE_TAG_BITS_MASK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the code without the variable declaration and it worked fine for me.
* Simplify global state management in adapter I was looking recently to implement args-related syscalls but that would require yet-more globals and yet-more state to be managed. Instead of adding a slew of new globals for this which are all manually kept in sync I opted to instead redesign how global state is managed in the adapter. The previous multiple `global`s are all removed in favor of just one, as sort of a "tls slot" of sorts. This one remaining slot points to a one-time-allocated `State` object which internally stores information like buffer metadata, fd information, etc. Along the way I've also simplified syscalls with new methods and `?`-using closures. * Turn off incremental for dev builds Helps with CGU splitting and ensuring that appropriate code is produced even without `--release`. * Review comments * Add accessors with specific errors * Update handling of `*_global_ptr` * Update internal mutability around path buffer Use an `UnsafeCell` layering to indicate that mutation may happen through `&T`.
* Simplify global state management in adapter I was looking recently to implement args-related syscalls but that would require yet-more globals and yet-more state to be managed. Instead of adding a slew of new globals for this which are all manually kept in sync I opted to instead redesign how global state is managed in the adapter. The previous multiple `global`s are all removed in favor of just one, as sort of a "tls slot" of sorts. This one remaining slot points to a one-time-allocated `State` object which internally stores information like buffer metadata, fd information, etc. Along the way I've also simplified syscalls with new methods and `?`-using closures. * Turn off incremental for dev builds Helps with CGU splitting and ensuring that appropriate code is produced even without `--release`. * Review comments * Add accessors with specific errors * Update handling of `*_global_ptr` * Update internal mutability around path buffer Use an `UnsafeCell` layering to indicate that mutation may happen through `&T`.
cranelift/wasm/src/state.rs
Outdated
// Extract random tag | ||
let latest_tag = builder.ins().band_imm(tagged_index, MTE_TAG_BITS_MASK); | ||
|
||
// Increment saved, and wrap around if 16-byte overflow would occur. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a bug. If we increment 15 (i.e. 0b1111), then we would get 0b, which is the free tag. So we have to somehow ensure we don't get any of the special tags (only Free tag equal to 0b0000 for now, but we might add more in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Also once #10 is ready, we need to exclude tags that are able to access the runtime memory or memory of other instances.
We might use odd tags for segment
instructions and even tags for everything else (runtime memory, free tags, etc.).
For this, we need to make sure irg
only generates one of the odd tags and then simply change the above to increment by two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you think only using odd tags is somewhat of a waste of tags, since we only need two special tags for #10. Maybe it's fine because we're guaranteeing no consecutive stack allocations have the same tag, but we'd also be giving up half the possible tag values, possibly decreasing probabilistic safety in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think of a simpler (probably less efficient) way of excluding our special tags. Any ideas for an algorithm for that, or is splitting in even and odd really the only performant way to go? I was thinking maybe just use an if statement on the 0b0000 tag for now, but that will get harder once we add more tags (from #10) and is also an extra cond instruction with overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree it's a waste, but we'll need more than two special tags at some point. Wasmtime can run more than one wasm instance. In this case, each heap in the memory pool needs one distinct tag.
I am not 100% sure if that will work though.
Adding a branch is relatively slow, IMHO.
What you could do is determine the tag at compile time. I haven't thought about the security implications of this yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I determine the tag at compile-time? Currently, we still rely on irg
for initial tag generation, which is a runtime instruction. All I know at compile-time is the offset on the random tag generated by irg
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If by "determining the tag at compile-time" you mean generating a random 4-bit value at compile-time, and not relying on/using irg
at all. The first downside that comes to mind is that the random value would somehow be store inside the .cwasm
binary. Maybe that is a problem if the guest code can somehow read that .cwasm
binary and extract the tag, though I'm not even sure if that's of any use for such malicious guest code.
Should I try implementing it with a compile-time generated tag for now, see if it works, and if it does, we can still discuss whether the other option of using irg
with half the available tags, split into even and odd tags?
Just some comments from our Slack chat that I wanted to capture/summarize here:
Running the second command again and again will always produce different random tags, since these are randomly generated inside cranelift, so randomly generated when (JIT-)compiling input WASM code. This is expected. Of course, if the attacker was able to somehow read the generated .cwasm and able to overwrite points (to set pointer tags themselves, with this knowledge of the random tags), which contains the tags in plain text, that would be a problem. This is, however, much more performant than the previous The bigger problem: as one can see, all heap addresses are always tagged with the same tag. The reason is that there is technically only one WASM function that gets reused for every heap allocation: malloc. So, our wasmtime code generator will tag the malloc address once, and that tag then gets reused. There were a few approaches we discussed on how to fix this: Adding separate WASM instructions for stack and heap tagging (which complicates our WASM instruction "API"), simply swallowing the performance overhead (which is only theoretical for now, we have not measured anything) of the multiple if checks, or maybe even not employing deterministic tagging at all. |
We have decided to use ARM64's
which is looking very promising! As we can see:
|
cranelift/wasm/src/state.rs
Outdated
/// Take the 4-bit MTE tag from `from_index` and insert it into `to_index`. | ||
fn insert_tag_from_index_into_index( | ||
from_index: Value, | ||
to_index: Value, | ||
builder: &mut FunctionBuilder, | ||
) -> Value { | ||
// tag = from & 0x0F00... (keep only tag) | ||
let tag = builder.ins().band_imm(from_index, MTE_TAG_BITS_MASK); | ||
|
||
// to = to & 0xF0FF... (remove tag, keep rest) | ||
let to_index = builder.ins().band_imm(to_index, MTE_NON_TAG_BITS_MASK); | ||
|
||
// to | tag | ||
builder.ins().bor(to_index, tag) | ||
} | ||
|
||
impl FuncTranslationState { | ||
/// Tag the `index` with an MTE tag, and return the tagged index. | ||
/// Contiguous stack allocations are guaranteed to have different random | ||
/// tags. | ||
pub fn tag_index(&mut self, index: Value, builder: &mut FunctionBuilder) -> Value { | ||
let new_tagged_index = match self.latest_tagged_index.take() { | ||
None => builder.ins().arm64_irg(index), | ||
Some(previous_tagged_index) => { | ||
let index = insert_tag_from_index_into_index(previous_tagged_index, index, builder); | ||
builder.ins().arm64_add_to_tag(index, 1) | ||
} | ||
}; | ||
self.latest_tagged_index = Some(new_tagged_index); | ||
new_tagged_index | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-fink here's the core of the implementation. Do you see any more optimized ways e.g. for implementing insert_tag_from_index_to_index
, maybe with less instructions?
I have now tested this wasmtime binary and everything seems to work as expected! |
In each (C) function, stack arrays are allocated contiguously. We want to deterministically guarantee that no two consecutive stack allocations share the same MTE tag. Therefore, we generate an initial MTE tag randomly, and, consequently, increment that value (and apply modulo of course) and use that as the tag. We store the whole tagged index, which includes the MTE tag in its upper bits, instead of just the MTE tag.