-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any way to "forge" provenance? #466
Comments
If the compiler cannot see what FFI code (or any unanalyzed code in general) can do, it will assume that code does anything that code could legally do. When doing untyped copies (via Inventing provenance from thin air, such as from a memory mapped I/O device, or, in the case of a kernel, the basic kernel allocator, can be done by an |
I'm having a hard time understanding what exactly you are asking. Generally the first strategy should be to try to preserve provenance. The type for "untyped memory that may contain provenance" is If preserving provenance is not possible, the alternative is I think a more concrete example (as small as possible :) would help. I don't know if you are asking for versions of |
Here's a (made-up) example. Imagine that I'm implementing a kernel, and I have a syscall with the following signature: #[repr(C)]
struct Buffer {
base: *const u8,
len: usize,
next: *const Buffer,
}
extern "C" fn do_thing_with_buffers(buf: Buffer);
There are two scenarios we can imagine:
The question is: In each of the two cases, how can we produce pointers with valid provenance that can be used as vanilla pointers in the kernel's address space? I intentionally chose |
The syscall interface is an asm-level FFI boundary. There's no provenance there. To access user memory, the kernel needs to either keep track the provenance that is used for "all userland memory", or just make it exposed. In the former case, it can use For accessing kernel memory, it's the same story. Presumably here it will know the pointer points into some buffer, so it should probably do |
That leaves the question of what it means to "make [some memory] exposed" when you are a kernel and are directly manipulating the page tables to map data at address X. I think the correct answer is to just treat the data you mapped as already exposed, right? But is that documented somewhere? This is similar to other cases like
|
Exposing is a "ghost operation", so any inline asm block (without
No... we don't have enough established consensus about what provenance is and how it works to even really start officialy documenting this. :/ |
One thing we'd like to do in zerocopy eventually is be able to make a complete guarantee (where "complete" means "no holes in our logic") that all code is sound based only on the language semantics. Historically, we've taken an approach of trying to get "lower bounds" documented in the Reference or stdlib docs (ie, "if your code does this, it's definitely sound; if it doesn't do this, it might still be sound, but we can't guarantee it"). It feels like that approach would be appropriate here: we could agree on a strictest-possible definition of strict provenance, and add it to the Reference and say, "as long as your program abides by these rules, it's definitely sound." That doesn't preclude the ability to relax the rules in the future - and we can disclaim as much in the text. It also doesn't require us to stabilize an API for provenance (ie, |
That's hard to do without some basic terminology, and the strict provenance APIs are not doing much more than establish that...
Maybe it's worth another attempt to stabilize strict provenance.
I don't remember off the top of my head what the blocker was last time we tried to stabilize it. I think we got fairly close? My own main concern is the name of "ptr::invalid". That's not very descriptive and the pointer is actually valid for some things.
|
I for one would be quite happy to see the strict provenance APIs stabilized, with that one omitted for now. It's not a big deal to just do |
@joshlf reading the discussion here again, it seems for your question we would have to stabilize the idea of exposed provenance. That's a bigger ask than the core of strict provenance. from_exposed_addr with its angelic choice is the most sketchy part of our entire op.sem... I don't see us guaranteeing much about int2ptr casts any time soon, unfortunately. That operation is deeply cursed and hideously hard to specify well.
@saethlin so far at least we have not defined that a null pointer that was offset to somewhere else is valid for zero-sized accesses (and one of the options in #472 makes that not valid).
|
For this specific issue, I agree. However, we're also generally interested in stabilizing strict provenance - even the subset that doesn't include the discussion in this issue. It'd be a huge step forward because it's the last significant part of Rust's memory model that zerocopy needs to rely on which is still unspecified. |
As part of google/zerocopy#170, zerocopy is trying to figure out how to support users who need to dereference pointers received over FFI (including in a kernel (or kernel emulator), where the other side is a user-space process providing pointers into memory which the kernel (emulator) has access to). These pointers are passed as untyped bytes, as C void pointers, etc. The user needs to perform some validation (bounds checking etc) and then dereference these pointers. As discussed in the linked issue, this presents a serious footgun since the pointers may not have valid provenance after being round-tripped through a byte representation or other untyped representation.
We're hoping to support this use case by providing an API which can convert
&[u8]
or some other untyped representation to&T
whereT
contains raw pointers that can be soundly dereferenced. In order to do this, we need to ensure that the following holds: If a user has obtained an object via FFI or some other not-visible-to-Rust mechanism, if our mechanism converts that object to one or more raw pointers, and if certain facts hold about the pointers (they've been bounds checked, they point into "external" memory, etc), then dereferencing those pointers is sound.My question is: Is it possible, inside of a function with the signature
&T -> &U
whereT
may be[u8]
or some other "untyped" representation andU
contains raw pointers, to ensure that such future operations will be sound? I assume that this question is equivalent to asking: Is it possible to forge provenance such that the compiler will understand future pointer operations to have valid provenance, and thus be sound. But maybe that's not the whole story? I assume this is at a minimum possible to do inside the compiler or inside the standard library, as these need to supportextern "C" fn
, syscalls, etc.The text was updated successfully, but these errors were encountered: