Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototypes for hint IO (WIP) #631

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open

Conversation

matthiasgoergens
Copy link
Collaborator

@matthiasgoergens matthiasgoergens commented Nov 26, 2024

See #607

@matthiasgoergens matthiasgoergens changed the title Prototypes for hint IO Prototypes for hint IO (WIP) Nov 26, 2024

let mut rng = rand::thread_rng();

{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naure See these two experiments about padding left and padding right.

}

pub fn finalise(&self) -> Result<AlignedVec, Error> {
to_bytes::<Error>(&self.items)
Copy link
Collaborator

@hero78119 hero78119 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialization just happened once in the end, which imply iterating the data from begin to end.
can we do streaming serialization for more CPU cache friendly?

Copy link
Collaborator Author

@matthiasgoergens matthiasgoergens Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What goal are you trying to accomplish?

Serialisation happens in the host program running on native hardware. That's plenty fast enough, even if we slowed it down 1000x times. (And the total amount of data we want to serialise is limited by the fact that we need to deserialise that data in the VM, if it is to be useful. So it will always be a relatively small amount of data.)

So worrying about something as low level as CPU caches seems fairly misplaced here, when we don't really have to care about performance of this part at all? Or what am I missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the 'inner serialisation' happens one by one in something like a streaming fashion (both in the original and in the new design). At the end we just collate all the individual items, which is mostly a bunch of mem-copies and a tiny bit of logic for the offsets.


#[test]
fn test_prepare_and_consume_items() {
let stdin = make_stdin();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC line 139 & 140 is the host/guest boundary.
In real case, we will write stdin to a file then load in guest memory.

Then further question, why left/right padding matters?
In guest we load stdin into memory region with fixed start & dynamic end (end can be different per execution trace). Then in guest we just need get start/end, then restore AlignedVec from this specific address mem[start:end]. In this case, the padding is transparent & agnostic to rkyv?

Copy link
Collaborator Author

@matthiasgoergens matthiasgoergens Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC line 139 & 140 is the host/guest boundary.

Yes, exactly!

In real case, we will write stdin to a file then load in guest memory.

That's a possibility, or you can keep it in RAM as part of a datastructure. No need to hit the file system. I think that's what our sproll already does. But it's a minor detail.

Then further question, why left/right padding matters?

You described a whole dance you have to do in the guest, if you use right-padding.

With left-padding, this just goes away and all boundaries are static, and you don't have to worry about manually recovering a length:

You map stdin so that the end is at a fixed location, then in the guest you do something like this:

let bytes = slice::from_raw_parts(HINT_IO_REGION_START, HINT_IO_REGION_LENGTH);
rkyv::access::<T, Error>(&bytes)

Where HINT_IO_REGION_START and HINT_IO_REGION_LENGTH are compile-time constants and don't depend on the specific input or execution at all.

No need to get the dynamic start and end, no need to restore AlignedVec.

Now technically that slice of bytes contains lots of stuff at the start that's not initialised and you couldn't even access without your proof failing. But rkyv will never access that, so it's fine.

I admit that rkyv is a bit weird in writing its entry point at the end of the buffer. It would be easier to explain to you folks, if they had it at the start of the buffer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce left padding is not free in terms of complexity for our current proving system, especially the verifier logic. It make the verifier memory range overlap check harder and more error pruning. I would suggest we better encaptulate it in SDK, as it's just minor effort, and I can already foresee it simplicity.

We can have end address defined in first word, then retrieved as let end_hint_addr = mem[HINT_IO_REGION_START], then use mem[HINT_IO_REGION_START + 4: end_hint_addr] to restore AlignedVec. With that, there is almost no sacrificing with efficient, but we keep backend simple and consistent.

Copy link
Collaborator Author

@matthiasgoergens matthiasgoergens Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can code it up that way, if you want to.

Let me change the prototype tomorrow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a prototype of the new version on GitHub now. It's a bit more of a mess, because instead of letting rkyv handle two layers of serialisation, we handle the outer layer ourselves, so that we can put its 'entry point' at the start of the hints memory region.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can already have a look at the new version, if you are feeling brave, but at the moment it's held together with hope and duct-tape. Let me apply some polish to make it more palatable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's saner now, and you can have a look.

let (prefix, lengths, suffix) = unsafe { buf.align_to::<u32>() };
assert!(prefix.is_empty());
assert!(suffix.is_empty());
let mut iter = lengths.iter().map(|len| {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be part of our library, and not visible in normal guest code.

Ok(())
}

pub fn finalise(&self) -> SerialisedCenoStdin {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is a bit of a mess. I think it might be comparatively easier to understand the memory format by looking at the readers first, and then you can try to decipher this function. (Or you can wait for the cleanup up version. :)

The main complexity here comes from me having mixed all the padding logic throughout. What we are actually doing ain't very complicated conceptually.

type Item = &'a [u8];
fn next(&mut self) -> Option<Self::Item> {
let len = u32::from_le_bytes(
self.buf.0[self.next..][..size_of::<u32>()]
Copy link
Collaborator Author

@matthiasgoergens matthiasgoergens Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunxian-xia So we actually only store 'pointers' to the end of the slices at the start of our hints memory region.

That's because rkyv doesn't mind, if we give it a slice that has extra garbage at the beginning, so without loss of generality, all the slices here start at the beginning of the hints memory region. They are overlapping.

We can get away with this, because it's all read-only. If you want to go read write, you'll want non-overlapping slices, which is also doable at only a small extra overhead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crucially, I have also not put in any information to delineate the end of the first section that stores the slice lengths / end points: a well-intentioned prover will provide all the entries that the guest program will want to read anyway, and a malicious prover could just give you misleading information.

If you accidentally read beyond the intended end, your deserialisation will (likely) fail.

However, I'm open to adding an explicit end, if you think that's useful for eg debugging, even if it makes no difference for security. Just let me know, what you think, please.

}

pub fn read_slice<'a>(&'a mut self) -> &'b [u8] {
self.next().unwrap()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We intentionally convert the error into a panic here, because we guest programs shouldn't do any elaborate error recovery anyway: an honest prover will prepare all the data/witnesses so that everything goes off without a hitch, and a malicious prover needs to be only caught reliably and get a program abort.

Recovering from errors only makes the latter task harder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a style suggest: probably self.next().expect(reasoning here) more self-explanation

impl<'b> SerialisedCenoStdinIter<'b> {
pub fn read<'a, T>(&'a mut self) -> &'b T
where
T: Portable + for<'c> CheckBytes<HighValidator<'c, Error>>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the guest we would change Error to Failure here. rancor::Error keeps track of error messages, which is useful for the prototype and in the host. rancor::Failure only keeps track of whether we have an error or not, so that's simpler and faster for the guest.

println!("buf.len() after offset padding: {}", buf.len());
for (offset, item) in izip!(offsets, &self.items) {
buf.extend_from_slice(item);
buf.extend_from_slice(&vec![0; buf.len().next_multiple_of(16) - buf.len()]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it will be more neat if just call extend_from_slice once directly to buf.len().next_multiple_of(16) then use copy from slice to fill the prefix data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, let me look into that.

}

pub fn finalise(&self) -> SerialisedCenoStdin {
// TODO: perhaps don't hardcode 16 here.
Copy link
Collaborator

@hero78119 hero78119 Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC 16 means align with 16 "byte", which means 4 u32 word, I guess it's for rkyv format. But I am wondering does rkyv support configurable 4 bytes, i.e. 1 word? in this case all memory access still align u32 so we can use "LW" instruction to do everything and overall format still more compact

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rkyv specifically wants 16 bytes (unless you use the unaligned feature), which is 4 words on u32.

No worries, the alignment to 16 bytes is only a requirement for the backing storage. Internally rkyv only uses one eg 4 bytes to store a u32 in a struct or array somewhere, they don't blow it up to 16 bytes for every single element.

}

pub fn read_slice<'a>(&'a mut self) -> &'b [u8] {
self.next().unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a style suggest: probably self.next().expect(reasoning here) more self-explanation

where
T: Portable + for<'c> CheckBytes<HighValidator<'c, Error>>,
{
rkyv::access::<T, Error>(self.read_slice()).unwrap()
Copy link
Collaborator

@hero78119 hero78119 Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I remember correctly, from your previous example, we can throw the whole deliberate slice into rkyv and iterating through it by rkyv iterator implementation, which means we dont need to implementing a wrapping iterator by ourself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in principle. But that only works, if we make sure that the end of the mapped data always shows up at a consistent fixed memory address. rkyv finds its data from the end.

Since we decided that our VM can only put the start of our mapped data at a fixed consistent address, we would need to eg store the length of the mapped data at the fixed start of the region.

But once we are mucking around with bytes ourselves, we might as well do the whole outer layer by ourselves, since it's very simple.

There's some additional complexity required for directly storing rkyv-serialised data inside of rkyv in such a way that you can zero-copy deserialise both layers. The extra complexity is mostly around the 16 byte alignment requirement. That's why my original prototype used the unaligned feature.

It's totally doable, but then probably not really simpler than coding up this outer layer ourselves.

However, I'm open to using rkyv for the outer layer as well, if you think that's cleaner. The code in here is just a prototype.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants