-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototypes for hint IO (WIP) #631
base: master
Are you sure you want to change the base?
Conversation
|
||
let mut rng = rand::thread_rng(); | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naure See these two experiments about padding left and padding right.
ceno_host/src/lib.rs
Outdated
} | ||
|
||
pub fn finalise(&self) -> Result<AlignedVec, Error> { | ||
to_bytes::<Error>(&self.items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serialization just happened once in the end, which imply iterating the data from begin to end.
can we do streaming serialization for more CPU cache friendly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What goal are you trying to accomplish?
Serialisation happens in the host program running on native hardware. That's plenty fast enough, even if we slowed it down 1000x times. (And the total amount of data we want to serialise is limited by the fact that we need to deserialise that data in the VM, if it is to be useful. So it will always be a relatively small amount of data.)
So worrying about something as low level as CPU caches seems fairly misplaced here, when we don't really have to care about performance of this part at all? Or what am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, the 'inner serialisation' happens one by one in something like a streaming fashion (both in the original and in the new design). At the end we just collate all the individual items, which is mostly a bunch of mem-copies and a tiny bit of logic for the offsets.
|
||
#[test] | ||
fn test_prepare_and_consume_items() { | ||
let stdin = make_stdin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC line 139 & 140 is the host/guest boundary.
In real case, we will write stdin
to a file then load in guest memory.
Then further question, why left/right padding matters?
In guest we load stdin
into memory region with fixed start & dynamic end (end can be different per execution trace). Then in guest we just need get start/end, then restore AlignedVec from this specific address mem[start:end]
. In this case, the padding is transparent & agnostic to rkyv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC line 139 & 140 is the host/guest boundary.
Yes, exactly!
In real case, we will write
stdin
to a file then load in guest memory.
That's a possibility, or you can keep it in RAM as part of a datastructure. No need to hit the file system. I think that's what our sproll
already does. But it's a minor detail.
Then further question, why left/right padding matters?
You described a whole dance you have to do in the guest, if you use right-padding.
With left-padding, this just goes away and all boundaries are static, and you don't have to worry about manually recovering a length:
You map stdin
so that the end is at a fixed location, then in the guest you do something like this:
let bytes = slice::from_raw_parts(HINT_IO_REGION_START, HINT_IO_REGION_LENGTH);
rkyv::access::<T, Error>(&bytes)
Where HINT_IO_REGION_START
and HINT_IO_REGION_LENGTH
are compile-time constants and don't depend on the specific input or execution at all.
No need to get the dynamic start and end, no need to restore AlignedVec
.
Now technically that slice of
bytes
contains lots of stuff at the start that's not initialised and you couldn't even access without your proof failing. But rkyv will never access that, so it's fine.I admit that rkyv is a bit weird in writing its entry point at the end of the buffer. It would be easier to explain to you folks, if they had it at the start of the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce left padding is not free in terms of complexity for our current proving system, especially the verifier logic. It make the verifier memory range overlap check harder and more error pruning. I would suggest we better encaptulate it in SDK, as it's just minor effort, and I can already foresee it simplicity.
We can have end address defined in first word, then retrieved as let end_hint_addr = mem[HINT_IO_REGION_START]
, then use mem[HINT_IO_REGION_START + 4: end_hint_addr] to restore AlignedVec. With that, there is almost no sacrificing with efficient, but we keep backend simple and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can code it up that way, if you want to.
Let me change the prototype tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a prototype of the new version on GitHub now. It's a bit more of a mess, because instead of letting rkyv handle two layers of serialisation, we handle the outer layer ourselves, so that we can put its 'entry point' at the start of the hints memory region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can already have a look at the new version, if you are feeling brave, but at the moment it's held together with hope and duct-tape. Let me apply some polish to make it more palatable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's saner now, and you can have a look.
ceno_host/src/lib.rs
Outdated
let (prefix, lengths, suffix) = unsafe { buf.align_to::<u32>() }; | ||
assert!(prefix.is_empty()); | ||
assert!(suffix.is_empty()); | ||
let mut iter = lengths.iter().map(|len| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be part of our library, and not visible in normal guest code.
Ok(()) | ||
} | ||
|
||
pub fn finalise(&self) -> SerialisedCenoStdin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this is a bit of a mess. I think it might be comparatively easier to understand the memory format by looking at the readers first, and then you can try to decipher this function. (Or you can wait for the cleanup up version. :)
The main complexity here comes from me having mixed all the padding logic throughout. What we are actually doing ain't very complicated conceptually.
type Item = &'a [u8]; | ||
fn next(&mut self) -> Option<Self::Item> { | ||
let len = u32::from_le_bytes( | ||
self.buf.0[self.next..][..size_of::<u32>()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kunxian-xia So we actually only store 'pointers' to the end of the slices at the start of our hints memory region.
That's because rkyv doesn't mind, if we give it a slice that has extra garbage at the beginning, so without loss of generality, all the slices here start at the beginning of the hints memory region. They are overlapping.
We can get away with this, because it's all read-only. If you want to go read write, you'll want non-overlapping slices, which is also doable at only a small extra overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crucially, I have also not put in any information to delineate the end of the first section that stores the slice lengths / end points: a well-intentioned prover will provide all the entries that the guest program will want to read anyway, and a malicious prover could just give you misleading information.
If you accidentally read beyond the intended end, your deserialisation will (likely) fail.
However, I'm open to adding an explicit end, if you think that's useful for eg debugging, even if it makes no difference for security. Just let me know, what you think, please.
} | ||
|
||
pub fn read_slice<'a>(&'a mut self) -> &'b [u8] { | ||
self.next().unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We intentionally convert the error into a panic here, because we guest programs shouldn't do any elaborate error recovery anyway: an honest prover will prepare all the data/witnesses so that everything goes off without a hitch, and a malicious prover needs to be only caught reliably and get a program abort.
Recovering from errors only makes the latter task harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a style suggest: probably self.next().expect(reasoning here)
more self-explanation
impl<'b> SerialisedCenoStdinIter<'b> { | ||
pub fn read<'a, T>(&'a mut self) -> &'b T | ||
where | ||
T: Portable + for<'c> CheckBytes<HighValidator<'c, Error>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the guest we would change Error
to Failure
here. rancor::Error
keeps track of error messages, which is useful for the prototype and in the host. rancor::Failure
only keeps track of whether we have an error or not, so that's simpler and faster for the guest.
println!("buf.len() after offset padding: {}", buf.len()); | ||
for (offset, item) in izip!(offsets, &self.items) { | ||
buf.extend_from_slice(item); | ||
buf.extend_from_slice(&vec![0; buf.len().next_multiple_of(16) - buf.len()]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it will be more neat if just call extend_from_slice once directly to buf.len().next_multiple_of(16) then use copy from slice to fill the prefix data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, let me look into that.
} | ||
|
||
pub fn finalise(&self) -> SerialisedCenoStdin { | ||
// TODO: perhaps don't hardcode 16 here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC 16 means align with 16 "byte", which means 4 u32 word, I guess it's for rkyv format. But I am wondering does rkyv support configurable 4 bytes, i.e. 1 word? in this case all memory access still align u32 so we can use "LW" instruction to do everything and overall format still more compact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rkyv specifically wants 16 bytes (unless you use the unaligned
feature), which is 4 words on u32.
No worries, the alignment to 16 bytes is only a requirement for the backing storage. Internally rkyv only uses one eg 4 bytes to store a u32
in a struct or array somewhere, they don't blow it up to 16 bytes for every single element.
} | ||
|
||
pub fn read_slice<'a>(&'a mut self) -> &'b [u8] { | ||
self.next().unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a style suggest: probably self.next().expect(reasoning here)
more self-explanation
where | ||
T: Portable + for<'c> CheckBytes<HighValidator<'c, Error>>, | ||
{ | ||
rkyv::access::<T, Error>(self.read_slice()).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I remember correctly, from your previous example, we can throw the whole deliberate slice into rkyv and iterating through it by rkyv iterator implementation, which means we dont need to implementing a wrapping iterator by ourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in principle. But that only works, if we make sure that the end of the mapped data always shows up at a consistent fixed memory address. rkyv
finds its data from the end.
Since we decided that our VM can only put the start of our mapped data at a fixed consistent address, we would need to eg store the length of the mapped data at the fixed start of the region.
But once we are mucking around with bytes ourselves, we might as well do the whole outer layer by ourselves, since it's very simple.
There's some additional complexity required for directly storing rkyv-serialised data inside of rkyv in such a way that you can zero-copy deserialise both layers. The extra complexity is mostly around the 16 byte alignment requirement. That's why my original prototype used the unaligned
feature.
It's totally doable, but then probably not really simpler than coding up this outer layer ourselves.
However, I'm open to using rkyv for the outer layer as well, if you think that's cleaner. The code in here is just a prototype.
See #607