ethereum · eth-bot · Apr 16, 2024 · Apr 4, 2024 · Apr 10, 2024 · Apr 11, 2024
@@ -27,14 +27,14 @@ and preventing the execution and deployment of any invalid code.
 The operand stack validation provides several benefits:
 
 - removes the run-time stack underflow check for all instructions,
-- removes the run-time stack overflow check for all instruction except `CALLF`,
+- removes the run-time stack overflow check for all instructions except `CALLF` and `JUMPF` (`JUMPF` introduced in a separate EIP) ,
 - ensures that execution terminates with one of the terminating instructions,
 - prevents deployment of code with unreachable instructions, thereby discouraging the use of code sections for data storage.
 
 It also has some disadvantages:
 
 - adds constraints to the code structure (similar to JVM, CPython bytecode, WebAssembly and others); however, these constraints can be lifted in a backward-compatible manner if they are shown to be user-unfriendly,
-- adds second validation pass; however, validation's computational and space complexity remains linear.
+- it is natural to implement stack validation as a second validation pass; however, it is not strictly required and validation's computational and space complexity remains linear in any implementation variant.
 
 The guarantees created by these validation rules also improve the feasabiliy of Ahead-Of-Time and Just-In-Time compilation of EVM code. Single pass transpilation passes can be safely executed with the code validation and advanced stack/register handling can be applied with the stack height validaitons. While not as impactful to a mainnet validator node that is bound mostly by storage state sizes, these can significantly speed up witness validation and other non-mainnet use cases.
 
@@ -58,50 +58,76 @@ In the second validation phase control-flow analysis is performed on the code.
 
 *Terminating instructions* refers to the instructions either:
 
-- ending function execution: `RETF`, or
-- ending whole EVM execution: `STOP`, `RETURN`, `REVERT`, `INVALID`.
+- ending function execution: `RETF`, `JUMPF`, or
+- ending whole EVM execution: `STOP`, `RETURN`, `RETURNCONTRACT`, `REVERT`, `INVALID`.
 
-For each reachable instruction in the code the operand stack height is recorded.
-The first instruction has the recorded stack height equal to the number of inputs to the function type matching the code (`type[code_section_index].inputs`).
+*note: `JUMPF` and `RETURNCONTRACT` are introduced in separate EIPs.*
 
-The FIFO queue *worklist* is used for tracking "to be inspected" instructions. Initially, it contains the first instruction.
+*Forward jump* refers to any of `RJUMP`/`RJUMPI`/`RJUMPV` instruction with relative offset greater than or equal to 0. *Backwards jump* refers to any of `RJUMP`/`RJUMPI`/`RJUMPV` instruction with relative offset less than 0, including jumps to the same jump instruction.
 
-While *worklist* is not empty, dequeue an instruction and:
 
-1. Determine the effect the instruction has on the operand stack:
-   1. **Check** if the recorded stack height satisfies the instruction requirements. Specifically:
-      - for `CALLF` instruction the recorded stack height must be at least the number of inputs of the called function according to its type defined in the type section.
-      - for `RETF` instruction the recorded stack height must be exactly the number of outputs of the function matching the code.
-   2. Compute new stack height after the instruction execution.
-2. Determine the list of successor instructions that can follow the current instructions:
+Instructions in the code are scanned in a single linear pass over the code. For each instruction the operand stack height bounds are recorded as `stack_height_min` and `stack_height_max`.
+
+The first instruction's recorded stack height bounds are initialized to be equal to the number of inputs to the function type matching the code (`stack_height_min = stack_height_max = type[code_section_index].inputs`).
+
+For each instruction:
+
+1. **Check** if this instruction has recorded stack height bounds. If it does not, it means it was neither referenced by previous forward jump, nor is part of sequential instruction flow, and this code fails validation.
+
+  - It is a prerequisite to validation algorithm, and code generators are required to order code basic blocks in a way that no block is referenced only by backwards jump.
+
+2. Determine the effect the instruction has on the operand stack:
+   1. **Check** if the recorded stack height bounds satisfy the instruction requirements. Specifically:
+      - for `CALLF` instruction the recorded stack height lower bound must be at least the number of inputs of the called function according to its type defined in the type section,
+      - for `RETF` instruction both the recorded lower and upper bound must be equal and must be exactly the number of outputs of the function matching the code,
+      - for `JUMPF` into returning function both the recorded lower and upper bound must equal exactly `type[current_section_index].outputs + type[target_section_index].inputs - type[target_section_index].outputs`,
+      - for `JUMPF` into non-returning function the recorded stack height lower bound must be at least the number of inputs of the target function according to its type defined in the type section,
+      - for any other instruction the recodrded stack height lower bound must be at least the number of inputs required by instruction,
+      - there is no additional check for terminating instructions other than `RETF` and `JUMPF`, this implies that extra items left on stack at instruction ending EVM execution are allowed.
+   2. For `CALLF` and `JUMPF` **check** for possible stack overflow: if recorded stack height upper bound is greater than `1024 - types[target_section_index].max_stack_height + types[target_section_index].inputs`, validation fails.
+   3. Compute new stack height bounds after the instruction execution. Upper and lower bound are updated by the same value:
+     - after `CALLF` stack height bounds are adjusted by adding `types[target_section_index].outputs - types[target_section_index].inputs`,
+     - after any other non-terminating instruction stack height bounds are adjusted by subtracting the number of instruction inputs and adding the number of instruction outputs,
+     - terminating instructions do not need to update stack height bounds.
+3. Determine the list of successor instructions that can follow the current instructions:
    1. The next instruction for all instructions other than terminating instructions and unconditional jump.
    2. All targets of a conditional or unconditional jump.
-3. For each successor instruction:
+4. For each successor instruction:
    1. **Check** if the instruction is present in the code (i.e. execution must not "fall off" the code).
-   2. If the instruction does not have stack height recorded (visited for the first time):
-      1. Record the instruction stack height as the value computed in 1.2.
-      2. Add the instruction to the *worklist*.
-   3. Otherwise, **check** if the recorded stack height equals the value computed in 1.2.
+   2. If the successor is reached via forwards jump or sequential flow from previous instruction:
+      1. If the instruction does not have stack height bounds recorded (visited for the first time), record the instruction stack height bound as the value computed in 2.3.
+      2. Otherwise instruction was already visited (by previously seen forward jump). Update this instruction's recorded stack height bounds so that they contain the bounds computed in 2.3, i.e. `target_stack_min = min(target_stack_min, current_stack_min)` and `target_stack_max = max(target_stack_max, current_stack_min)`, where `(target_stack_min, target_stack_max)` are successor bounds and `(current_stack_min, current_stack_max)` are bounds computed in 2.3.
+   3. If the successor is reached via backwards jump, **check** if the recorded stack height bounds equal the value computed in 2.3. Validation fails if they are not equal, i.e. we see backwards jump to a different stack height.
 
-When *worklist* is empty:
+After all instructions are visited, determine the function maximum operand stack height:
 
-1. **Check** if all instruction were reached by the analysis.
-2. Determine the function maximum operand stack height:
-   1. Compute the maximum stack height as the maximum of all recorded stack heights.
+   1. Compute the maximum stack height as the maximum of all recorded stack height upper bounds.
    2. **Check** if the maximum stack height does not exceed the limit of 1023 (`0x3FF`).
    3. **Check** if the maximum stack height matches the corresponding code section's `max_stack_height` within the type section.
 
 The computational and space complexity of this pass is *O(len(code))*. Each instruction is visited at most once.
 
 ### Execution
 
-Given the deploy-time validation guarantees, an EVM implementation is not required anymore to have run-time stack underflow nor overflow checks for each executed instruction. The exception is the `CALLF` performing operand stack overflow check for the entire called function.
+Given the deploy-time validation guarantees, an EVM implementation is not required anymore to have run-time stack underflow nor overflow checks for each executed instruction. The exception is the `CALLF` and `JUMPF` performing operand stack overflow check for the entire called function.
 
 ## Rationale
 
-### Stack overflow check only in CALLF
+### Properties of validated code
+
+Any code section validated according to operand stack validation has the following properties:
+
+1. There are no unreachable instructions
+2. There are no instructions reachable only via backwards jump
+3. Operand stack underflow cannot happen.
+4. Operand stack overflow can only happen at `CALLF` or `JUMPF` instruction.
+5. Multiple forward jump instructions executing at different stack heights may target the same instruction; the stack of target basic block is validated for all possible heights.
+6. Any backwards jump instruction can only target an instruction that is executed with equal stack height; this prevents deployment of the loops with unbounded stack pushing or popping.
+7. Final instruction in the code section is either terminating instruction or `RJUMP`.
+
+### Stack overflow check only in CALLF/JUMPF
 
-In this EIP, we provide a more efficient variant of the EVM where stack overflow check is performed only in `CALLF` instruction using the called function's `max_stack_height` information. This decreases flexibility of an EVM program because `max_stack_height` corresponds to the worst-case control-flow path in the function.
+In this EIP, we provide a more efficient variant of the EVM where stack overflow check is performed only in `CALLF` and `JUMPF` instructions using the called function's `max_stack_height` information. This decreases flexibility of an EVM program because `max_stack_height` corresponds to the worst-case control-flow path in the function.
 
 ### Unreachable code
 
@@ -114,6 +140,20 @@ Otherwise, the `RETF` semantic would be more complicated. For `n` function outpu
 However, lifting the requirement and modifying the `RETF` semantic as described above is backward
 compatible and can be easily introduced in the future.
 
+### More restrictive stack validation
+
+Originally another variant of stack validation was proposed, where instead of linear scan of the code section, all code paths were examined by following the target(s) of every jump instruction in a breadth-first-search manner, tracking stack height for each visited instruction and checking that for every possible code path to a particular instruction its stack height remains constant.
+
+The advantage of this variant would be somewhat simpler algorithm (we would not need to track stack height bounds, but only a single stack height value for each instruction) and no extra requirement for ordering of code basic blocks (see below).
+
+However compiler teams opposed to such restrictive stack height requirements. One prominent pattern used by compilers which wouldn't be possible is jumping to terminating helpers (code blocks ending with `RETURN` or `REVERT`) from different stack heights. This is common for example for a series of `assert` statements, each one compiled to a `RJUMPI` into a shared terminating helper. Enforcing constant stack requirement would mean that before jumping to such helper, extra items on the stack have to be popped, and this noticably increases code size and consumed gas, and would defeat the purpose of extracting these common terminating sequences into a helper.
+
+### Ordering of basic blocks
+
+The prerequisite to stack validation algorithm is ordering of code basic blocks in a way that no block is referenced only by backwards jump.
+
+This is required to make it possible to examine each instruction in one linear pass over the code section. Forward pass over the code section allows for the algorithm to "expand" each forward jump target's stack height bounds and still keep the complexity linear. Trying to do jump target stack bounds expansion while scanning the code in the breadth-first-search manner would require to re-examine entire code path after its stack height bounds are expanded, which would result in quadratic complexity.
+
 ## Backwards Compatibility
 
 This change requires a "network upgrade", since it modifies consensus rules.