Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deterministic tagging for stack allocations #13

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

fritzrehde
Copy link
Collaborator

In each (C) function, stack arrays are allocated contiguously. We want to deterministically guarantee that no two consecutive stack allocations share the same MTE tag. Therefore, we generate an initial MTE tag randomly, and, consequently, increment that value (and apply modulo of course) and use that as the tag. We store the whole tagged index, which includes the MTE tag in its upper bits, instead of just the MTE tag.

}
Some(tagged_index) => {
// Extract random tag
let latest_tag = builder.ins().band_imm(tagged_index, MTE_TAG_BITS_MASK);
Copy link
Collaborator Author

@fritzrehde fritzrehde Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line currently panics on band_imm, if you follow the execution path further you will see that value_type gets called and the array access there panics because the Value doesn't exist. So I guess saving to an Option in the state might not be the correct way of saving values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the code without the variable declaration and it worked fine for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. What input C code did you use for testing?

cranelift/wasm/src/state.rs Outdated Show resolved Hide resolved
}
Some(tagged_index) => {
// Extract random tag
let latest_tag = builder.ins().band_imm(tagged_index, MTE_TAG_BITS_MASK);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the code without the variable declaration and it worked fine for me.

martin-fink pushed a commit that referenced this pull request Nov 21, 2023
* Simplify global state management in adapter

I was looking recently to implement args-related syscalls but that would
require yet-more globals and yet-more state to be managed. Instead of
adding a slew of new globals for this which are all manually kept in
sync I opted to instead redesign how global state is managed in the
adapter.

The previous multiple `global`s are all removed in favor of just one, as
sort of a "tls slot" of sorts. This one remaining slot points to a
one-time-allocated `State` object which internally stores information
like buffer metadata, fd information, etc. Along the way I've also
simplified syscalls with new methods and `?`-using closures.

* Turn off incremental for dev builds

Helps with CGU splitting and ensuring that appropriate code is produced
even without `--release`.

* Review comments

* Add accessors with specific errors
* Update handling of `*_global_ptr`

* Update internal mutability around path buffer

Use an `UnsafeCell` layering to indicate that mutation may happen
through `&T`.
martin-fink pushed a commit that referenced this pull request Nov 21, 2023
* Simplify global state management in adapter

I was looking recently to implement args-related syscalls but that would
require yet-more globals and yet-more state to be managed. Instead of
adding a slew of new globals for this which are all manually kept in
sync I opted to instead redesign how global state is managed in the
adapter.

The previous multiple `global`s are all removed in favor of just one, as
sort of a "tls slot" of sorts. This one remaining slot points to a
one-time-allocated `State` object which internally stores information
like buffer metadata, fd information, etc. Along the way I've also
simplified syscalls with new methods and `?`-using closures.

* Turn off incremental for dev builds

Helps with CGU splitting and ensuring that appropriate code is produced
even without `--release`.

* Review comments

* Add accessors with specific errors
* Update handling of `*_global_ptr`

* Update internal mutability around path buffer

Use an `UnsafeCell` layering to indicate that mutation may happen
through `&T`.
// Extract random tag
let latest_tag = builder.ins().band_imm(tagged_index, MTE_TAG_BITS_MASK);

// Increment saved, and wrap around if 16-byte overflow would occur.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a bug. If we increment 15 (i.e. 0b1111), then we would get 0b, which is the free tag. So we have to somehow ensure we don't get any of the special tags (only Free tag equal to 0b0000 for now, but we might add more in the future).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Also once #10 is ready, we need to exclude tags that are able to access the runtime memory or memory of other instances.
We might use odd tags for segment instructions and even tags for everything else (runtime memory, free tags, etc.).

For this, we need to make sure irg only generates one of the odd tags and then simply change the above to increment by two.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think only using odd tags is somewhat of a waste of tags, since we only need two special tags for #10. Maybe it's fine because we're guaranteeing no consecutive stack allocations have the same tag, but we'd also be giving up half the possible tag values, possibly decreasing probabilistic safety in some way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think of a simpler (probably less efficient) way of excluding our special tags. Any ideas for an algorithm for that, or is splitting in even and odd really the only performant way to go? I was thinking maybe just use an if statement on the 0b0000 tag for now, but that will get harder once we add more tags (from #10) and is also an extra cond instruction with overhead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree it's a waste, but we'll need more than two special tags at some point. Wasmtime can run more than one wasm instance. In this case, each heap in the memory pool needs one distinct tag.
I am not 100% sure if that will work though.

Adding a branch is relatively slow, IMHO.

What you could do is determine the tag at compile time. I haven't thought about the security implications of this yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I determine the tag at compile-time? Currently, we still rely on irg for initial tag generation, which is a runtime instruction. All I know at compile-time is the offset on the random tag generated by irg.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by "determining the tag at compile-time" you mean generating a random 4-bit value at compile-time, and not relying on/using irg at all. The first downside that comes to mind is that the random value would somehow be store inside the .cwasm binary. Maybe that is a problem if the guest code can somehow read that .cwasm binary and extract the tag, though I'm not even sure if that's of any use for such malicious guest code.
Should I try implementing it with a compile-time generated tag for now, see if it works, and if it does, we can still discuss whether the other option of using irg with half the available tags, split into even and odd tags?

@fritzrehde fritzrehde self-assigned this Nov 21, 2023
@fritzrehde
Copy link
Collaborator Author

Just some comments from our Slack chat that I wanted to capture/summarize here:
I have implemented the compile-time tagging of both stack and heap addresses. This is the output of some debugging I did:

root@debian-qemu-mte:/home/fritz# ./wasmtime-determ compile --cranelift-enable=use_mte --wasm-features=memory64,mem-safety deterministic-test.wasm
Calling state.tag_index(v23)
initial random tag: RandomMteTag { tag: 15, index: 14 }
Calling state.tag_index(v12)
initial random tag: RandomMteTag { tag: 11, index: 10 }
Calling state.tag_index(v27)
next random tag: RandomMteTag { tag: 12, index: 11 }
Calling state.tag_index(v43)
next random tag: RandomMteTag { tag: 13, index: 12 }
Calling state.tag_index(v59)
next random tag: RandomMteTag { tag: 14, index: 13 }

root@debian-qemu-mte:/home/fritz# ./wasmtime-determ run --cranelift-enable=use_mte --wasm-features=memory64,mem-safety --allow-precompiled deterministic-test.cwasm 10
enabling mte for memory (enable_mte): ptr = 0xffff2ffd0000, len = 0x20000
input integer (just printed so code below doesn't get optimized away) = 10
stack_alloc_1; tag: 11; address 0xb00000000011980; expected random tag X (first stack array in function)
stack_alloc_2; tag: 12; address 0xc00000000011930; expected incremented tag X+1
Tagging memory 0xf00000000011a20, size 16
heap_alloc_1; tag: 15; address 0xf00000000011a20; expected random tag Y (first heap array in function)
Tagging memory 0xf00000000011ad0, size 32
heap_alloc_2; tag: 15; address 0xf00000000011ad0; expected random tag Y+1
stack_alloc_3; tag: 13; address 0xd000000000118e0; expected incremented tag X+2
Tagging memory 0xf00000000011b00, size 64
heap_alloc_3; tag: 15; address 0xf00000000011b00; expected random tag Y+2
stack_alloc_4; tag: 14; address 0xe000000000118a0; expected incremented tag X+3
Untagging memory 0xf00000000011a20, size 16
Untagging memory 0xf00000000011ad0, size 32
Untagging memory 0xf00000000011b00, size 64

Running the second command again and again will always produce different random tags, since these are randomly generated inside cranelift, so randomly generated when (JIT-)compiling input WASM code. This is expected. Of course, if the attacker was able to somehow read the generated .cwasm and able to overwrite points (to set pointer tags themselves, with this knowledge of the random tags), which contains the tags in plain text, that would be a problem. This is, however, much more performant than the previous irg method, since we no longer need an if branch to ensure we don't randomly generate one of the special tags (only 0b0000 currently on this git branch, but more special free tags could be added in the future, which would just make the same problem even worse).

The bigger problem: as one can see, all heap addresses are always tagged with the same tag. The reason is that there is technically only one WASM function that gets reused for every heap allocation: malloc. So, our wasmtime code generator will tag the malloc address once, and that tag then gets reused. There were a few approaches we discussed on how to fix this: Adding separate WASM instructions for stack and heap tagging (which complicates our WASM instruction "API"), simply swallowing the performance overhead (which is only theoretical for now, we have not measured anything) of the multiple if checks, or maybe even not employing deterministic tagging at all.

@fritzrehde
Copy link
Collaborator Author

We have decided to use ARM64's ADDG instruction for the deterministic tagging, and running the same demo C file as in my previous message, we now get:

root@debian-qemu-mte:/home/fritz# ./wasmtime-determ-new run --cranelift-enable=use_mte --wasm-features=memory64,mem-safety --allow-precompiled deterministic-test.cwasm 10
enabling mte for memory (enable_mte): ptr = 0xffff0ffd0000, len = 0x20000
input integer (just printed so code below doesn't get optimized away) = 10
stack_alloc_1; tag: 10; address 0xa00000000011980; expected random tag X (first stack array in function)
stack_alloc_2; tag: 11; address 0xb00000000011930; expected incremented tag X+1
Tagging memory 0x800000000011a20, size 16
heap_alloc_1; tag: 8; address 0x800000000011a20; expected random tag Y (first heap array in function)
Tagging memory 0xb00000000011ad0, size 32
heap_alloc_2; tag: 11; address 0xb00000000011ad0; expected random tag Y+1
stack_alloc_3; tag: 12; address 0xc000000000118e0; expected incremented tag X+2
Tagging memory 0xc00000000011b00, size 64
heap_alloc_3; tag: 12; address 0xc00000000011b00; expected random tag Y+2
stack_alloc_4; tag: 13; address 0xd000000000118a0; expected incremented tag X+3
Untagging memory 0x800000000011a20, size 16
Untagging memory 0xb00000000011ad0, size 32
Untagging memory 0xc00000000011b00, size 64

which is looking very promising! As we can see:

  1. Each heap address is tagged differently (randomly, independently), because a simple irg is used.
  2. The initial stack address is tagged with irg, and consequent stack allocations (in the same C function) use incremented values.
    Performance should also not be an issue (in theory), since there are no branches. All we instrument is irg for initial tagging, and band_imm, band_imm, bor and addg for consequent tagging. No branches at all, since addg already excludes our special free tags (if those excluded special tags are configured, which we do with prctl).

Comment on lines 260 to 290
/// Take the 4-bit MTE tag from `from_index` and insert it into `to_index`.
fn insert_tag_from_index_into_index(
from_index: Value,
to_index: Value,
builder: &mut FunctionBuilder,
) -> Value {
// tag = from & 0x0F00... (keep only tag)
let tag = builder.ins().band_imm(from_index, MTE_TAG_BITS_MASK);

// to = to & 0xF0FF... (remove tag, keep rest)
let to_index = builder.ins().band_imm(to_index, MTE_NON_TAG_BITS_MASK);

// to | tag
builder.ins().bor(to_index, tag)
}

impl FuncTranslationState {
/// Tag the `index` with an MTE tag, and return the tagged index.
/// Contiguous stack allocations are guaranteed to have different random
/// tags.
pub fn tag_index(&mut self, index: Value, builder: &mut FunctionBuilder) -> Value {
let new_tagged_index = match self.latest_tagged_index.take() {
None => builder.ins().arm64_irg(index),
Some(previous_tagged_index) => {
let index = insert_tag_from_index_into_index(previous_tagged_index, index, builder);
builder.ins().arm64_add_to_tag(index, 1)
}
};
self.latest_tagged_index = Some(new_tagged_index);
new_tagged_index
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-fink here's the core of the implementation. Do you see any more optimized ways e.g. for implementing insert_tag_from_index_to_index, maybe with less instructions?

@fritzrehde
Copy link
Collaborator Author

I have now tested this wasmtime binary and everything seems to work as expected!
Next steps would be squashing everything into one commit, and including the changes from upstream (our current main branch). @martin-fink Do you want to test the performance on this before merging into main branch? I could also make some sort of feature flag, so the developer can choose if deterministic tagging gets enabled or not. That would, at least, make (performance) testing a lot nicer. Do you want me to do that (if so, what kind of cli flag/option would make sense here)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants