Prototypes for hint IO (WIP) #631

matthiasgoergens · 2024-11-26T07:41:49Z

ceno_host/Cargo.toml

matthiasgoergens · 2024-11-26T09:29:51Z

ceno_host/src/lib.rs

+
+        let mut rng = rand::thread_rng();
+
+        {


@naure See these two experiments about padding left and padding right.

hero78119 · 2024-11-26T09:33:41Z

ceno_host/src/lib.rs

+    }
+
+    pub fn finalise(&self) -> Result<AlignedVec, Error> {
+        to_bytes::<Error>(&self.items)


serialization just happened once in the end, which imply iterating the data from begin to end.
can we do streaming serialization for more CPU cache friendly?

What goal are you trying to accomplish?

Serialisation happens in the host program running on native hardware. That's plenty fast enough, even if we slowed it down 1000x times. (And the total amount of data we want to serialise is limited by the fact that we need to deserialise that data in the VM, if it is to be useful. So it will always be a relatively small amount of data.)

So worrying about something as low level as CPU caches seems fairly misplaced here, when we don't really have to care about performance of this part at all? Or what am I missing?

Btw, the 'inner serialisation' happens one by one in something like a streaming fashion (both in the original and in the new design). At the end we just collate all the individual items, which is mostly a bunch of mem-copies and a tiny bit of logic for the offsets.

hero78119 · 2024-11-26T09:43:48Z

ceno_host/src/lib.rs

+
+    #[test]
+    fn test_prepare_and_consume_items() {
+        let stdin = make_stdin();


IIUC line 139 & 140 is the host/guest boundary.
In real case, we will write stdin to a file then load in guest memory.

Then further question, why left/right padding matters?
In guest we load stdin into memory region with fixed start & dynamic end (end can be different per execution trace). Then in guest we just need get start/end, then restore AlignedVec from this specific address mem[start:end]. In this case, the padding is transparent & agnostic to rkyv?

IIUC line 139 & 140 is the host/guest boundary.

Yes, exactly!

In real case, we will write stdin to a file then load in guest memory.

That's a possibility, or you can keep it in RAM as part of a datastructure. No need to hit the file system. I think that's what our sproll already does. But it's a minor detail.

Then further question, why left/right padding matters?

You described a whole dance you have to do in the guest, if you use right-padding.

With left-padding, this just goes away and all boundaries are static, and you don't have to worry about manually recovering a length:

You map stdin so that the end is at a fixed location, then in the guest you do something like this:

let bytes = slice::from_raw_parts(HINT_IO_REGION_START, HINT_IO_REGION_LENGTH); rkyv::access::<T, Error>(&bytes)

Where HINT_IO_REGION_START and HINT_IO_REGION_LENGTH are compile-time constants and don't depend on the specific input or execution at all.

No need to get the dynamic start and end, no need to restore AlignedVec.

Now technically that slice of bytes contains lots of stuff at the start that's not initialised and you couldn't even access without your proof failing. But rkyv will never access that, so it's fine.

I admit that rkyv is a bit weird in writing its entry point at the end of the buffer. It would be easier to explain to you folks, if they had it at the start of the buffer.

Introduce left padding is not free in terms of complexity for our current proving system, especially the verifier logic. It make the verifier memory range overlap check harder and more error pruning. I would suggest we better encaptulate it in SDK, as it's just minor effort, and I can already foresee it simplicity.

We can have end address defined in first word, then retrieved as let end_hint_addr = mem[HINT_IO_REGION_START], then use mem[HINT_IO_REGION_START + 4: end_hint_addr] to restore AlignedVec. With that, there is almost no sacrificing with efficient, but we keep backend simple and consistent.

Ok, we can code it up that way, if you want to.

Let me change the prototype tomorrow.

I have a prototype of the new version on GitHub now. It's a bit more of a mess, because instead of letting rkyv handle two layers of serialisation, we handle the outer layer ourselves, so that we can put its 'entry point' at the start of the hints memory region.

You can already have a look at the new version, if you are feeling brave, but at the moment it's held together with hope and duct-tape. Let me apply some polish to make it more palatable.

It's saner now, and you can have a look.

matthiasgoergens · 2024-11-27T03:14:58Z

ceno_host/src/lib.rs

+        let (prefix, lengths, suffix) = unsafe { buf.align_to::<u32>() };
+        assert!(prefix.is_empty());
+        assert!(suffix.is_empty());
+        let mut iter = lengths.iter().map(|len| {


This would be part of our library, and not visible in normal guest code.

matthiasgoergens · 2024-11-27T06:45:27Z

ceno_host/src/lib.rs

+        Ok(())
+    }
+
+    pub fn finalise(&self) -> SerialisedCenoStdin {


Sorry, this is a bit of a mess. I think it might be comparatively easier to understand the memory format by looking at the readers first, and then you can try to decipher this function. (Or you can wait for the cleanup up version. :)

The main complexity here comes from me having mixed all the padding logic throughout. What we are actually doing ain't very complicated conceptually.

matthiasgoergens · 2024-11-27T06:48:59Z

ceno_host/src/lib.rs

+    type Item = &'a [u8];
+    fn next(&mut self) -> Option<Self::Item> {
+        let len = u32::from_le_bytes(
+            self.buf.0[self.next..][..size_of::<u32>()]


@kunxian-xia So we actually only store 'pointers' to the end of the slices at the start of our hints memory region.

That's because rkyv doesn't mind, if we give it a slice that has extra garbage at the beginning, so without loss of generality, all the slices here start at the beginning of the hints memory region. They are overlapping.

We can get away with this, because it's all read-only. If you want to go read write, you'll want non-overlapping slices, which is also doable at only a small extra overhead.

Crucially, I have also not put in any information to delineate the end of the first section that stores the slice lengths / end points: a well-intentioned prover will provide all the entries that the guest program will want to read anyway, and a malicious prover could just give you misleading information.

If you accidentally read beyond the intended end, your deserialisation will (likely) fail.

However, I'm open to adding an explicit end, if you think that's useful for eg debugging, even if it makes no difference for security. Just let me know, what you think, please.

matthiasgoergens · 2024-11-27T06:55:46Z

ceno_host/src/lib.rs

+    }
+
+    pub fn read_slice<'a>(&'a mut self) -> &'b [u8] {
+        self.next().unwrap()


We intentionally convert the error into a panic here, because we guest programs shouldn't do any elaborate error recovery anyway: an honest prover will prepare all the data/witnesses so that everything goes off without a hitch, and a malicious prover needs to be only caught reliably and get a program abort.

Recovering from errors only makes the latter task harder.

Just a style suggest: probably self.next().expect(reasoning here) more self-explanation

matthiasgoergens · 2024-11-27T06:56:55Z

ceno_host/src/lib.rs

+impl<'b> SerialisedCenoStdinIter<'b> {
+    pub fn read<'a, T>(&'a mut self) -> &'b T
+    where
+        T: Portable + for<'c> CheckBytes<HighValidator<'c, Error>>,


In the guest we would change Error to Failure here. rancor::Error keeps track of error messages, which is useful for the prototype and in the host. rancor::Failure only keeps track of whether we have an error or not, so that's simpler and faster for the guest.

hero78119 · 2024-11-28T02:33:40Z

ceno_host/src/lib.rs

+        println!("buf.len() after offset padding: {}", buf.len());
+        for (offset, item) in izip!(offsets, &self.items) {
+            buf.extend_from_slice(item);
+            buf.extend_from_slice(&vec![0; buf.len().next_multiple_of(16) - buf.len()]);


I guess it will be more neat if just call extend_from_slice once directly to buf.len().next_multiple_of(16) then use copy from slice to fill the prefix data.

Thanks for the suggestion, let me look into that.

hero78119 · 2024-11-28T02:38:18Z

ceno_host/src/lib.rs

+    }
+
+    pub fn finalise(&self) -> SerialisedCenoStdin {
+        // TODO: perhaps don't hardcode 16 here.


IIUC 16 means align with 16 "byte", which means 4 u32 word, I guess it's for rkyv format. But I am wondering does rkyv support configurable 4 bytes, i.e. 1 word? in this case all memory access still align u32 so we can use "LW" instruction to do everything and overall format still more compact

Rkyv specifically wants 16 bytes (unless you use the unaligned feature), which is 4 words on u32.

No worries, the alignment to 16 bytes is only a requirement for the backing storage. Internally rkyv only uses one eg 4 bytes to store a u32 in a struct or array somewhere, they don't blow it up to 16 bytes for every single element.

hero78119 · 2024-11-28T02:47:52Z

ceno_host/src/lib.rs

+    }
+
+    pub fn read_slice<'a>(&'a mut self) -> &'b [u8] {
+        self.next().unwrap()


Just a style suggest: probably self.next().expect(reasoning here) more self-explanation

hero78119 · 2024-11-28T03:04:31Z

ceno_host/src/lib.rs

+    where
+        T: Portable + for<'c> CheckBytes<HighValidator<'c, Error>>,
+    {
+        rkyv::access::<T, Error>(self.read_slice()).unwrap()


if I remember correctly, from your previous example, we can throw the whole deliberate slice into rkyv and iterating through it by rkyv iterator implementation, which means we dont need to implementing a wrapping iterator by ourself?

Yes, in principle. But that only works, if we make sure that the end of the mapped data always shows up at a consistent fixed memory address. rkyv finds its data from the end.

Since we decided that our VM can only put the start of our mapped data at a fixed consistent address, we would need to eg store the length of the mapped data at the fixed start of the region.

But once we are mucking around with bytes ourselves, we might as well do the whole outer layer by ourselves, since it's very simple.

There's some additional complexity required for directly storing rkyv-serialised data inside of rkyv in such a way that you can zero-copy deserialise both layers. The extra complexity is mostly around the 16 byte alignment requirement. That's why my original prototype used the unaligned feature.

It's totally doable, but then probably not really simpler than coding up this outer layer ourselves.

However, I'm open to using rkyv for the outer layer as well, if you think that's cleaner. The code in here is just a prototype.

matthiasgoergens added 7 commits November 25, 2024 14:23

Example

76f30f8

Initial test

322cab7

Merge remote-tracking branch 'origin/master' into matthias/ceno_host

fefe850

Unaligned works

ab03b62

Example

5544372

Typo

2541db1

Clean up names

5502b31

matthiasgoergens requested a review from hero78119 November 26, 2024 07:42

matthiasgoergens changed the title ~~Prototypes for hint IO~~ Prototypes for hint IO (WIP) Nov 26, 2024

matthiasgoergens commented Nov 26, 2024

View reviewed changes

ceno_host/Cargo.toml Outdated Show resolved Hide resolved

Remove notes

9fd1b7e

matthiasgoergens commented Nov 26, 2024

View reviewed changes

hero78119 reviewed Nov 26, 2024

View reviewed changes

matthiasgoergens added 5 commits November 26, 2024 18:13

Merge remote-tracking branch 'origin/master' into matthias/ceno_host

ddb65f7

Merge remote-tracking branch 'origin/master' into matthias/ceno_host

5d457fb

Alternative design

1faae5c

Aligned

c5db8de

rkyv simpler

7d3f54c

matthiasgoergens commented Nov 27, 2024

View reviewed changes

matthiasgoergens added 10 commits November 27, 2024 11:28

Polish

90f1742

More lifetimes

d2a92f8

More polish

289f2b3

Polished

8421bf8

Clean up

5bc1be7

Read slice

9f685d7

Explain

f238b8b

Less hard coding

c7487fb

Move definitions into test

f5b79b5

Remove commented out code

3d61d4c

Implmented TODO

a9a208e

matthiasgoergens commented Nov 27, 2024

View reviewed changes

Merge branch 'master' into matthias/ceno_host

35f24cd

naure mentioned this pull request Nov 27, 2024

Fix absolute address of the end of hint input #628

Closed

hero78119 reviewed Nov 28, 2024

View reviewed changes

matthiasgoergens added 3 commits November 29, 2024 11:10

Merge branch 'master' into matthias/ceno_host

cb0351d

Merge remote-tracking branch 'origin/master' into matthias/ceno_host

1eec669

Fix cargo.toml clippy

f0a4ce9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototypes for hint IO (WIP) #631

Prototypes for hint IO (WIP) #631

matthiasgoergens commented Nov 26, 2024 •

edited

Loading

matthiasgoergens Nov 26, 2024

hero78119 Nov 26, 2024 •

edited

Loading

matthiasgoergens Nov 26, 2024 •

edited

Loading

matthiasgoergens Nov 27, 2024

hero78119 Nov 26, 2024

matthiasgoergens Nov 26, 2024 •

edited

Loading

hero78119 Nov 26, 2024

matthiasgoergens Nov 26, 2024 •

edited

Loading

matthiasgoergens Nov 27, 2024

matthiasgoergens Nov 27, 2024

matthiasgoergens Nov 27, 2024

matthiasgoergens Nov 27, 2024

matthiasgoergens Nov 27, 2024

matthiasgoergens Nov 27, 2024 •

edited

Loading

matthiasgoergens Nov 27, 2024

matthiasgoergens Nov 27, 2024

hero78119 Nov 28, 2024

matthiasgoergens Nov 27, 2024

hero78119 Nov 28, 2024

matthiasgoergens Nov 28, 2024

hero78119 Nov 28, 2024 •

edited

Loading

matthiasgoergens Nov 28, 2024

hero78119 Nov 28, 2024

hero78119 Nov 28, 2024 •

edited

Loading

matthiasgoergens Nov 28, 2024

Prototypes for hint IO (WIP) #631

Are you sure you want to change the base?

Prototypes for hint IO (WIP) #631

Conversation

matthiasgoergens commented Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

hero78119 Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

matthiasgoergens Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthiasgoergens Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthiasgoergens Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthiasgoergens Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hero78119 Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hero78119 Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthiasgoergens commented Nov 26, 2024 •

edited

Loading

hero78119 Nov 26, 2024 •

edited

Loading

matthiasgoergens Nov 26, 2024 •

edited

Loading

matthiasgoergens Nov 26, 2024 •

edited

Loading

matthiasgoergens Nov 26, 2024 •

edited

Loading

matthiasgoergens Nov 27, 2024 •

edited

Loading

hero78119 Nov 28, 2024 •

edited

Loading

hero78119 Nov 28, 2024 •

edited

Loading