Add deterministic tagging for stack allocations #13

fritzrehde · 2023-09-12T10:01:10Z

In each (C) function, stack arrays are allocated contiguously. We want to deterministically guarantee that no two consecutive stack allocations share the same MTE tag. Therefore, we generate an initial MTE tag randomly, and, consequently, increment that value (and apply modulo of course) and use that as the tag. We store the whole tagged index, which includes the MTE tag in its upper bits, instead of just the MTE tag.

fritzrehde · 2023-09-12T10:03:32Z

cranelift/wasm/src/state.rs

+            }
+            Some(tagged_index) => {
+                // Extract random tag
+                let latest_tag = builder.ins().band_imm(tagged_index, MTE_TAG_BITS_MASK);


This line currently panics on band_imm, if you follow the execution path further you will see that value_type gets called and the array access there panics because the Value doesn't exist. So I guess saving to an Option in the state might not be the correct way of saving values.

I tested the code without the variable declaration and it worked fine for me.

Interesting. What input C code did you use for testing?

cranelift/wasm/src/state.rs

martin-fink · 2023-09-13T18:33:56Z

cranelift/wasm/src/state.rs

+            }
+            Some(tagged_index) => {
+                // Extract random tag
+                let latest_tag = builder.ins().band_imm(tagged_index, MTE_TAG_BITS_MASK);


I tested the code without the variable declaration and it worked fine for me.

* Simplify global state management in adapter I was looking recently to implement args-related syscalls but that would require yet-more globals and yet-more state to be managed. Instead of adding a slew of new globals for this which are all manually kept in sync I opted to instead redesign how global state is managed in the adapter. The previous multiple `global`s are all removed in favor of just one, as sort of a "tls slot" of sorts. This one remaining slot points to a one-time-allocated `State` object which internally stores information like buffer metadata, fd information, etc. Along the way I've also simplified syscalls with new methods and `?`-using closures. * Turn off incremental for dev builds Helps with CGU splitting and ensuring that appropriate code is produced even without `--release`. * Review comments * Add accessors with specific errors * Update handling of `*_global_ptr` * Update internal mutability around path buffer Use an `UnsafeCell` layering to indicate that mutation may happen through `&T`.

fritzrehde · 2023-11-21T17:47:29Z

cranelift/wasm/src/state.rs

+                // Extract random tag
+                let latest_tag = builder.ins().band_imm(tagged_index, MTE_TAG_BITS_MASK);
+
+                // Increment saved, and wrap around if 16-byte overflow would occur.


I think this might be a bug. If we increment 15 (i.e. 0b1111), then we would get 0b, which is the free tag. So we have to somehow ensure we don't get any of the special tags (only Free tag equal to 0b0000 for now, but we might add more in the future).

Right. Also once #10 is ready, we need to exclude tags that are able to access the runtime memory or memory of other instances.
We might use odd tags for segment instructions and even tags for everything else (runtime memory, free tags, etc.).

For this, we need to make sure irg only generates one of the odd tags and then simply change the above to increment by two.

Don't you think only using odd tags is somewhat of a waste of tags, since we only need two special tags for #10. Maybe it's fine because we're guaranteeing no consecutive stack allocations have the same tag, but we'd also be giving up half the possible tag values, possibly decreasing probabilistic safety in some way?

I'm trying to think of a simpler (probably less efficient) way of excluding our special tags. Any ideas for an algorithm for that, or is splitting in even and odd really the only performant way to go? I was thinking maybe just use an if statement on the 0b0000 tag for now, but that will get harder once we add more tags (from #10) and is also an extra cond instruction with overhead.

Yeah, I agree it's a waste, but we'll need more than two special tags at some point. Wasmtime can run more than one wasm instance. In this case, each heap in the memory pool needs one distinct tag.
I am not 100% sure if that will work though.

Adding a branch is relatively slow, IMHO.

What you could do is determine the tag at compile time. I haven't thought about the security implications of this yet.

How do I determine the tag at compile-time? Currently, we still rely on irg for initial tag generation, which is a runtime instruction. All I know at compile-time is the offset on the random tag generated by irg.

If by "determining the tag at compile-time" you mean generating a random 4-bit value at compile-time, and not relying on/using irg at all. The first downside that comes to mind is that the random value would somehow be store inside the .cwasm binary. Maybe that is a problem if the guest code can somehow read that .cwasm binary and extract the tag, though I'm not even sure if that's of any use for such malicious guest code.
Should I try implementing it with a compile-time generated tag for now, see if it works, and if it does, we can still discuss whether the other option of using irg with half the available tags, split into even and odd tags?

fritzrehde · 2024-02-06T06:12:12Z

Just some comments from our Slack chat that I wanted to capture/summarize here:
I have implemented the compile-time tagging of both stack and heap addresses. This is the output of some debugging I did:

root@debian-qemu-mte:/home/fritz# ./wasmtime-determ compile --cranelift-enable=use_mte --wasm-features=memory64,mem-safety deterministic-test.wasm
Calling state.tag_index(v23)
initial random tag: RandomMteTag { tag: 15, index: 14 }
Calling state.tag_index(v12)
initial random tag: RandomMteTag { tag: 11, index: 10 }
Calling state.tag_index(v27)
next random tag: RandomMteTag { tag: 12, index: 11 }
Calling state.tag_index(v43)
next random tag: RandomMteTag { tag: 13, index: 12 }
Calling state.tag_index(v59)
next random tag: RandomMteTag { tag: 14, index: 13 }

root@debian-qemu-mte:/home/fritz# ./wasmtime-determ run --cranelift-enable=use_mte --wasm-features=memory64,mem-safety --allow-precompiled deterministic-test.cwasm 10
enabling mte for memory (enable_mte): ptr = 0xffff2ffd0000, len = 0x20000
input integer (just printed so code below doesn't get optimized away) = 10
stack_alloc_1; tag: 11; address 0xb00000000011980; expected random tag X (first stack array in function)
stack_alloc_2; tag: 12; address 0xc00000000011930; expected incremented tag X+1
Tagging memory 0xf00000000011a20, size 16
heap_alloc_1; tag: 15; address 0xf00000000011a20; expected random tag Y (first heap array in function)
Tagging memory 0xf00000000011ad0, size 32
heap_alloc_2; tag: 15; address 0xf00000000011ad0; expected random tag Y+1
stack_alloc_3; tag: 13; address 0xd000000000118e0; expected incremented tag X+2
Tagging memory 0xf00000000011b00, size 64
heap_alloc_3; tag: 15; address 0xf00000000011b00; expected random tag Y+2
stack_alloc_4; tag: 14; address 0xe000000000118a0; expected incremented tag X+3
Untagging memory 0xf00000000011a20, size 16
Untagging memory 0xf00000000011ad0, size 32
Untagging memory 0xf00000000011b00, size 64

Running the second command again and again will always produce different random tags, since these are randomly generated inside cranelift, so randomly generated when (JIT-)compiling input WASM code. This is expected. Of course, if the attacker was able to somehow read the generated .cwasm and able to overwrite points (to set pointer tags themselves, with this knowledge of the random tags), which contains the tags in plain text, that would be a problem. This is, however, much more performant than the previous irg method, since we no longer need an if branch to ensure we don't randomly generate one of the special tags (only 0b0000 currently on this git branch, but more special free tags could be added in the future, which would just make the same problem even worse).

The bigger problem: as one can see, all heap addresses are always tagged with the same tag. The reason is that there is technically only one WASM function that gets reused for every heap allocation: malloc. So, our wasmtime code generator will tag the malloc address once, and that tag then gets reused. There were a few approaches we discussed on how to fix this: Adding separate WASM instructions for stack and heap tagging (which complicates our WASM instruction "API"), simply swallowing the performance overhead (which is only theoretical for now, we have not measured anything) of the multiple if checks, or maybe even not employing deterministic tagging at all.

fritzrehde · 2024-02-07T04:32:46Z

We have decided to use ARM64's ADDG instruction for the deterministic tagging, and running the same demo C file as in my previous message, we now get:

root@debian-qemu-mte:/home/fritz# ./wasmtime-determ-new run --cranelift-enable=use_mte --wasm-features=memory64,mem-safety --allow-precompiled deterministic-test.cwasm 10
enabling mte for memory (enable_mte): ptr = 0xffff0ffd0000, len = 0x20000
input integer (just printed so code below doesn't get optimized away) = 10
stack_alloc_1; tag: 10; address 0xa00000000011980; expected random tag X (first stack array in function)
stack_alloc_2; tag: 11; address 0xb00000000011930; expected incremented tag X+1
Tagging memory 0x800000000011a20, size 16
heap_alloc_1; tag: 8; address 0x800000000011a20; expected random tag Y (first heap array in function)
Tagging memory 0xb00000000011ad0, size 32
heap_alloc_2; tag: 11; address 0xb00000000011ad0; expected random tag Y+1
stack_alloc_3; tag: 12; address 0xc000000000118e0; expected incremented tag X+2
Tagging memory 0xc00000000011b00, size 64
heap_alloc_3; tag: 12; address 0xc00000000011b00; expected random tag Y+2
stack_alloc_4; tag: 13; address 0xd000000000118a0; expected incremented tag X+3
Untagging memory 0x800000000011a20, size 16
Untagging memory 0xb00000000011ad0, size 32
Untagging memory 0xc00000000011b00, size 64

which is looking very promising! As we can see:

Each heap address is tagged differently (randomly, independently), because a simple irg is used.
The initial stack address is tagged with irg, and consequent stack allocations (in the same C function) use incremented values.
Performance should also not be an issue (in theory), since there are no branches. All we instrument is irg for initial tagging, and band_imm, band_imm, bor and addg for consequent tagging. No branches at all, since addg already excludes our special free tags (if those excluded special tags are configured, which we do with prctl).

fritzrehde · 2024-02-07T04:35:02Z

cranelift/wasm/src/state.rs

+/// Take the 4-bit MTE tag from `from_index` and insert it into `to_index`.
+fn insert_tag_from_index_into_index(
+    from_index: Value,
+    to_index: Value,
+    builder: &mut FunctionBuilder,
+) -> Value {
+    // tag = from & 0x0F00... (keep only tag)
+    let tag = builder.ins().band_imm(from_index, MTE_TAG_BITS_MASK);
+
+    // to = to & 0xF0FF... (remove tag, keep rest)
+    let to_index = builder.ins().band_imm(to_index, MTE_NON_TAG_BITS_MASK);
+
+    // to | tag
+    builder.ins().bor(to_index, tag)
+}
+
+impl FuncTranslationState {
+    /// Tag the `index` with an MTE tag, and return the tagged index.
+    /// Contiguous stack allocations are guaranteed to have different random
+    /// tags.
+    pub fn tag_index(&mut self, index: Value, builder: &mut FunctionBuilder) -> Value {
+        let new_tagged_index = match self.latest_tagged_index.take() {
+            None => builder.ins().arm64_irg(index),
+            Some(previous_tagged_index) => {
+                let index = insert_tag_from_index_into_index(previous_tagged_index, index, builder);
+                builder.ins().arm64_add_to_tag(index, 1)
+            }
+        };
+        self.latest_tagged_index = Some(new_tagged_index);
+        new_tagged_index
+    }


@martin-fink here's the core of the implementation. Do you see any more optimized ways e.g. for implementing insert_tag_from_index_to_index, maybe with less instructions?

fritzrehde · 2024-02-08T06:13:58Z

I have now tested this wasmtime binary and everything seems to work as expected!
Next steps would be squashing everything into one commit, and including the changes from upstream (our current main branch). @martin-fink Do you want to test the performance on this before merging into main branch? I could also make some sort of feature flag, so the developer can choose if deterministic tagging gets enabled or not. That would, at least, make (performance) testing a lot nicer. Do you want me to do that (if so, what kind of cli flag/option would make sense here)?

First try at deterministic tagging

269b018

fritzrehde commented Sep 12, 2023

View reviewed changes

martin-fink reviewed Sep 13, 2023

View reviewed changes

fritzrehde added 4 commits September 13, 2023 21:52

Experimenting

5c9056b

some more debugging

0a3f82e

added none

f866e8a

add comments to document changes

69517ef

fritzrehde commented Nov 21, 2023

View reviewed changes

fritzrehde self-assigned this Nov 21, 2023

fritzrehde added 4 commits November 23, 2023 01:00

Generate all MTE tags at compile-time

303b31f

Turned local variables into constants in segment free

e4b7b93

Fixed bug where incorrect index was tagged

d3102e3

Added docstrings to RandomMteTag struct

0cfe196

fritzrehde added 3 commits February 6, 2024 17:13

Improved docstring slightly

a2ce016

First attempt at using ADDG inst

817078b

Fixed bit manipulation bugs

816a14e

Removed unused imports

0398493

fritzrehde commented Feb 7, 2024

View reviewed changes

Removed print debug statements

abf4894

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deterministic tagging for stack allocations #13

Add deterministic tagging for stack allocations #13

fritzrehde commented Sep 12, 2023

fritzrehde Sep 12, 2023 •

edited

Loading

martin-fink Sep 13, 2023

fritzrehde Sep 13, 2023

martin-fink Sep 13, 2023

fritzrehde Nov 21, 2023

martin-fink Nov 22, 2023

fritzrehde Nov 22, 2023

fritzrehde Nov 22, 2023

martin-fink Nov 22, 2023

fritzrehde Nov 22, 2023

fritzrehde Nov 22, 2023

fritzrehde commented Feb 6, 2024

fritzrehde commented Feb 7, 2024

fritzrehde Feb 7, 2024

fritzrehde commented Feb 8, 2024

Add deterministic tagging for stack allocations #13

Are you sure you want to change the base?

Add deterministic tagging for stack allocations #13

Conversation

fritzrehde commented Sep 12, 2023

fritzrehde Sep 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fritzrehde commented Feb 6, 2024

fritzrehde commented Feb 7, 2024

Choose a reason for hiding this comment

fritzrehde commented Feb 8, 2024

fritzrehde Sep 12, 2023 •

edited

Loading