Merge pull request #305 from a16z/book/moodlezoup-core-components

Flesh out wiki
a16z · Apr 17, 2024 · c859eb0 · c859eb0
2 parents 6f857f5 + f205ccb
commit c859eb0
Show file tree

Hide file tree

Showing 23 changed files with 244 additions and 131 deletions.
diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
@@ -11,15 +11,18 @@
     - [Troubleshooting](./usage/troubleshooting.md)
 - [Contributors](./contributors.md)
     - [How it works](./how_it_works.md)
-        - [Lookups](./how/lookups.md)
-        - [Memory Checking](./how/memory-checking.md)
-        - [GKR](./how/gkr.md)
         - [Jolt](./how/jolt.md)
+            - [Instruction lookups](./how/instruction_lookups.md)
+            - [Read-write memory](./how/read_write_memory.md)
+            - [Bytecode](./how/bytecode.md)
+            - [R1CS constraints](./how/r1cs_constraints.md)
     - [Background](./background.md)
         - [Sumcheck](./background/sumcheck.md)
         - [Multilinear Extensions](./background/multilinear-extensions.md)
         - [Eq Extension](./background/eq-polynomial.md)
         - [Batched Polynomial Openings](./background/batched-openings.md)
+        - [Memory Checking](./background/memory-checking.md)
+        - [GKR](./background/gkr.md)
     - [Dev](./dev/README.md)
         - [Install](./dev/install.md)
         - [Tools](./dev/tools.md)

diff --git a/book/src/background.md b/book/src/background.md
@@ -1,5 +1,7 @@
 # Background
 If you're interested in learning the nitty gritty details of SNARKs including questions like 'why', 'is it sound', you should read [Proofs and Args ZK](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf) by Thaler. If you're simply interested in 'how', this section is for you.
 - [Sumcheck](./background/sumcheck.md)
+- [GKR](./background/gkr.md)
 - [Multilinear Extensions](./background/multilinear-extensions.md)
 - [EQ Polynomial](./background/eq-polynomial.md)
+- [Memory Checking](./background/memory-checking.md)
diff --git a/book/src/background/eq-polynomial.md b/book/src/background/eq-polynomial.md
@@ -1,13 +1,13 @@
 # Eq Extension
-The $\widetilde{eq}$ MLE is useful throughout the protocol
+The $\widetilde{\text{eq}}$ MLE is useful throughout the protocol
 $$
-\widetilde{eq}(r,x) = \begin{cases} 
+\widetilde{\text{eq}}(r,x) = \begin{cases} 
 1 & \text{if } r = x \\
 \mathbb{F} & \text{otherwise}
 \end{cases}
 $$
 
 
 $$
-\widetilde{eq}(r,x) = \prod_{i=1}^{\log(m)} (r_i \cdot x_i + (1 - r_i) \cdot (1 - x_i)) \quad \text{where } r, x \in \{0,1\}^{\log(m)}
+\widetilde{\text{eq}}(r,x) = \prod_{i=1}^{\log(m)} (r_i \cdot x_i + (1 - r_i) \cdot (1 - x_i)) \quad \text{where } r, x \in \{0,1\}^{\log(m)}
 $$
diff --git a/book/src/background/gkr.md b/book/src/background/gkr.md
@@ -0,0 +1,35 @@
+# GKR
+GKR is a SNARK protocol for binary trees of multiplication / addition gates. The standard form allows combinations of both using a wiring predicate $\widetilde{V}_i$, and two additional MLEs $\widetilde{\text{add}}_i$ and $\widetilde{\text{mult}}_i$. 
+
+$\widetilde{V}_i(j)$ evaluates to the value of he circuit at the $i$-th layer in the $j$-th gate. For example $\widetilde{V}_1(0)$ corresponds to the output gate.
+
+$\widetilde{\text{add}}_i(j)$ evaluates to 1 if the $j$-th gate of the $i$-th layer is an addition gate.
+
+$\widetilde{\text{mult}}_i(j)$ evaluates to 1 if the $j$-th gate of the $i$-th layer is a multiplication gate.
+
+The sumcheck protocol is applied to the following:
+$$
+\widetilde{V}_i(z) = \sum_{(p,\omega_1,\omega_2) \in \{0,1\}^{s_i+2s_{i+1}}} f_{i,z}(p,\omega_1,\omega_2),
+$$
+
+where
+
+$$
+f_i(z, p, \omega_1, \omega_2) = \beta_{s_i}(z, p) \cdot \widetilde{\text{add}}_i(p, \omega_1, \omega_2)(\widetilde{V}_{i+1}(\omega_1) + \widetilde{V}_{i+1}(\omega_2)) + \widetilde{\text{mult}}_i(p, \omega_1, \omega_2)\widetilde{V}_{i+1}(\omega_1) \cdot \widetilde{V}_{i+1}(\omega_2)
+$$
+$$
+\beta_{s_i}(z, p) = \prod_{j=1}^{s_i} ((1-z_j)(1-p_j) + z_j p_j).
+$$
+
+
+Lasso and Jolt implement the [Thaler13](https://eprint.iacr.org/2013/351.pdf) version of GKR which is optimized for the far simpler case of a binary tree of multiplication gates. This simplifies each sumcheck to:
+$$
+\widetilde{V}_i(z) = \sum_{p \in \{0,1\}^{s_i}} g^{(i)}_z(p),
+$$
+
+where
+
+$$
+g^{(i)}_z(p) = \beta_{s_i}(z, p) \cdot \widetilde{V}_{i+1}(p,0) \cdot \widetilde{V}_{i+1}(p,1)
+$$
+GKR is utilized in [memory-checking](./memory-checking.html) for the multi-set permutation check.
diff --git a/book/src/how/memory-checking.md → book/src/background/memory-checking.md b/book/src/how/memory-checking.md → book/src/background/memory-checking.md
diff --git a/book/src/background/multilinear-extensions.md b/book/src/background/multilinear-extensions.md
@@ -29,4 +29,4 @@ for i in 0..half {
 ```
 
 ### Multi Variable Binding
-Another common algorithm is to take the MLE $\tilde{f}(x_1, ... x_v)$ and compute its evaluation at a single $v$-variate point outside the boolean hypercube $x \in \mathbb{F}^v$. This algorithm can be performed in $O(n log(n))$ time by preforming the single variable binding algorithm $log(n)$ times.
+Another common algorithm is to take the MLE $\tilde{f}(x_1, ... x_v)$ and compute its evaluation at a single $v$-variate point outside the boolean hypercube $x \in \mathbb{F}^v$. This algorithm can be performed in $O(n \log(n))$ time by preforming the single variable binding algorithm $\log(n)$ times.
diff --git a/book/src/background/sumcheck.md b/book/src/background/sumcheck.md
@@ -6,7 +6,7 @@ Suppose we are given a $v$-variate polynomial $g$ defined over a finite field $\
 $$ H := \sum_{b_1 \in \{0,1\}} \sum_{b_2 \in \{0,1\}} \cdots \sum_{b_v \in \{0,1\}} g(b_1, \ldots, b_v). $$
 
 In order to execute the protocol, the verifier needs to be able to evaluate $g(r_1, \ldots, r_v)$ for a randomly chosen vector $(r_1, \ldots, r_v) \in \mathbb{F}^v$. Hence, from the verifier's perspective, the sum-check protocol _reduces_ the task 
-of summing $g$'s evaluations over $2^v$ inputs (namely, all inputs in $\{0, 1}^{v}$) to the task of evaluating $g$
+of summing $g$'s evaluations over $2^v$ inputs (namely, all inputs in $\{0, 1\}^{v}$) to the task of evaluating $g$
 at a _single_ input in $(r_1, \ldots, r_v) \in \mathbb{F}^v$.
 
 The protocol proceeds in $v$ rounds as follows. In the first round, the prover sends a polynomial $g_1(X_1)$, and claims that 

diff --git a/book/src/how/bytecode.md b/book/src/how/bytecode.md
@@ -0,0 +1,45 @@
+# Bytecode
+
+Jolt proves the validity of registers and RAM using offline memory checking.
+
+## Decoding an ELF file
+
+The tracer iterates over the `.text` sections of the program ELF file and decode RISC-V instructions. Each instruction gets mapped to a `BytecodeRow` struct: 
+
+```rust
+pub struct BytecodeRow {
+    /// Memory address as read from the ELF.
+    address: usize,
+    /// Packed instruction/circuit flags, used for r1cs
+    bitflags: u64,
+    /// Index of the destination register for this instruction (0 if register is unused).
+    rd: u64,
+    /// Index of the first source register for this instruction (0 if register is unused).
+    rs1: u64,
+    /// Index of the second source register for this instruction (0 if register is unused).
+    rs2: u64,
+    /// "Immediate" value for this instruction (0 if unused).
+    imm: u64,
+}
+```
+
+The registers (`rd`, `rs1`, `rs2`) and `imm` are described in the [RISC-V spec](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf). 
+The `bitflags` field is described in greater detail [below](#bitflags).
+
+The preprocessed bytecode serves as the (read-only) "memory" over which we perform offline memory checking. 
+
+![bytecode_trace](../imgs/bytecode_trace.png)
+
+We currently assume the program bytecode is sufficiently short and known by the verifier, so the prover doesn't need to provide a commitment to it –– the verifier can compute the MLE of the bytecode on its own. 
+This guarantees that the bytecode trace is consistent with the agreed-upon program.
+
+Instead of $(a, v, t)$ tuples, each step in the bytecode trace is mapped to a tuple $(a, v_\texttt{bitflags}, v_\texttt{rd}, v_\texttt{rs1}, v_\texttt{rs2}, v_\texttt{imm}, t)$.
+Otherwise, the offline memory checking proceeds as described in [Offline Memory Checking](../background/memory-checking.md).
+
+
+## Bitflags
+
+The `bitflags` of a given instruction is the concatenation of its [circuit flags](./r1cs_constraints.md#circuit-and-instruction-flags) and [instruction flags](./instruction_lookups.md).
+This concatenation is enforced by R1CS constraints.
+
+![bitflags](../imgs/bitflags.png)
diff --git a/book/src/how/gkr.md b/book/src/how/gkr.md
diff --git a/book/src/how/instruction_lookups.md b/book/src/how/instruction_lookups.md
@@ -0,0 +1,53 @@
+# Instruction lookups
+
+In Jolt, the "execute" part of the "fetch-decode-execute" loop is handled using a lookup argument (Lasso). 
+At a high level, the lookup queries are the instruction operands, and the lookup outputs are the instruction outputs.
+
+## Lasso
+
+Lasso is a lookup argument (equivalent to a SNARK for reads into a read-only memory). Lookup arguments allow the prover to convince the verifier that a (committed) set of values $Q$ is a subset of a lookup table $T$. Lasso is a special lookup argument with highly desirable asymptotic costs largely correlated to the number of queries $Q$ rather than the length of of the table $T$.
+
+A conversational background on lookups can be found [here](https://a16zcrypto.com/posts/article/building-on-lasso-and-jolt/). In short: lookups are great for zkVMs as they allow constant cost / developer complexity for the prover algorithm per VM instruction.
+
+A detailed engineering overview of Lasso (as a standalone lookup argument) can be found [here](https://www.youtube.com/watch?v=iDcXj9Vx3zY).
+
+## Using Lasso to handle instruction execution
+
+Lasso requires that each primitive instruction satisfies a decomposability property. 
+The property needed is that the input(s) to the instruction can be broken into "chunks" (say, with each chunk
+consisting of 16 bits), such that one can obtain the answer to the original instruction by
+evaluating a simple function on each chunk and then "collating" the results together.
+For example, the bitwise-OR of two 32-bit inputs x and y can be computed by breaking each input up into 8-bit chunks, XORing 
+each 8-bit chunk of x with the associated chunk of y, and concatenating the results together.
+
+In Lasso, we call the task of evaluating a simple function on each chunk a "subtable lookup" (the relevant lookup table
+being the table storing all $2^{16}$ evaluations of the simple function). And the "collating" of 
+the results of the subtable lookups into the result of the original lookup (instruction execution on the un-chunked inputs)
+is handled via an invocation of the sum-check protocol. We call this the "primary sumcheck" instance in Lasso.
+
+The "primary sumcheck" collation looks as follows for a trace of length $m$ and a VM with $f$ unique instructions.
+
+$\sum_{x \in \{0,1\}^{\log_2(m)}} \left[\widetilde{\text{eq}}(r,x) \cdot \sum_{f \in \{0,1\}^{\log_2(F)}} \widetilde{\text{flags}_f}(x) \cdot g_f(\text{terms}_f(x))\right]$
+
+$\widetilde{\text{flags}_f}(x)$ is an indicator function for the $f$-th instruction: $\widetilde{\text{flags}_f}(x) = 1$ if instruction $f$ is used during the $x$-th step of the trace, when $x \in \{0,1\}^{\log_2(m)}$. 
+
+$g_f(...)$ is the collation function used by the $f$-th instruction.
+
+$\text{terms}_f(x) = [E_1(x), ... E_\alpha(x)]$ where $\alpha$ is the number of independent memories used by an instruction. For simple instructions like the EQ instruction, $\alpha = C$, $\text{terms}_f(x) = [E_1(x), ... E_C(x)]$. More complicated instructions such LT might have $\text{terms}_f(x) = [E_{\texttt{EQ}}(x), E_{\texttt{LT1}}(x), E_{\texttt{LT2}}(x)]$. The exact layout is dependent on the number of subtables required by the decomposition of the instruction. The mappings can be found in the `JoltInstruction::subtable` method implementations.
+
+### Mental Model
+For a given $r = x \in \{0,1\}^{log(m)}$ (think integer index of the instruction within the trace), $\widetilde{\text{eq}} = 0$ for all but one term of the outer sum. Similarly all $\widetilde{\text{flags}_f}(x) = 0$ for all but one term of the inner sum. Leaving just the collation function of a single instruction, evaluating to the collated lookup output of the single instruction. In reality $r$ is a random point $r \in \mathbb{F}^{log(m)}$ selected by the verifier over the course of the protocol. The evaluation point provides a distance amplified encoding of the entire trace of instructions.
+
+
+To illustrate more concretely imagine a two-instruction VM for LT and EQ instructions with $C=1$.
+
+$$
+\sum_{x \in \{0, 1\}^{\log_2(m)}}{\widetilde{\text{eq}}(r,x) \cdot \left[ \widetilde{\text{flags}}_{\texttt{LT}}(x) \cdot g_{\texttt{LT}}(E_{\texttt{LT}}(x)) + \widetilde{\text{flags}}_{\texttt{EQ}}(x) \cdot g_{\texttt{EQ}}(E_{\texttt{EQ}}(x)) \right]}
+$$
+
+
+## Subtable Flags
+`TODO`
+- We then use memory checking to determine that each of the memories $E_i$ is well formed
+- At a given step of the CPU only a single instruction will be used, that means that only that instruction's subtables will be used. For the rest of the memories we insert a no_op with (a, v) = 0. In order to make the GKR trees cheaper to compute and sumcheck we'll add a single additional layer to the GKR tree. During this layer we'll "toggle" each of the GKR leaves to "1" in the case that it is an unused step of the CPU. This will make the binary tree of multiplication gates cheaper. We'll toggle based on a new flags polynomial called $subtable-flags_f$ which is the sum of all of the $instruction-flags_f$ used in the instruction collation above.
+- The function to compute each of the leaves becomes $leaf[i] = \text{subtable-flags}[i] \cdot \text{fingerprint}[i] + (1 - \text{subtable-flags}[i])$