Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update EIP-5450: Align with EOF megaspec #8388

Merged
merged 10 commits into from
Apr 16, 2024
92 changes: 66 additions & 26 deletions EIPS/eip-5450.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,14 @@ and preventing the execution and deployment of any invalid code.
The operand stack validation provides several benefits:

- removes the run-time stack underflow check for all instructions,
- removes the run-time stack overflow check for all instruction except `CALLF`,
- removes the run-time stack overflow check for all instructions except `CALLF` and `JUMPF` (`JUMPF` introduced in a separate EIP) ,
- ensures that execution terminates with one of the terminating instructions,
- prevents deployment of code with unreachable instructions, thereby discouraging the use of code sections for data storage.

It also has some disadvantages:

- adds constraints to the code structure (similar to JVM, CPython bytecode, WebAssembly and others); however, these constraints can be lifted in a backward-compatible manner if they are shown to be user-unfriendly,
- adds second validation pass; however, validation's computational and space complexity remains linear.
- it is natural to implement stack validation as a second validation pass; however, it is not strictly required and validation's computational and space complexity remains linear in any implementation variant.

The guarantees created by these validation rules also improve the feasabiliy of Ahead-Of-Time and Just-In-Time compilation of EVM code. Single pass transpilation passes can be safely executed with the code validation and advanced stack/register handling can be applied with the stack height validaitons. While not as impactful to a mainnet validator node that is bound mostly by storage state sizes, these can significantly speed up witness validation and other non-mainnet use cases.

Expand All @@ -58,50 +58,76 @@ In the second validation phase control-flow analysis is performed on the code.

*Terminating instructions* refers to the instructions either:

- ending function execution: `RETF`, or
- ending whole EVM execution: `STOP`, `RETURN`, `REVERT`, `INVALID`.
- ending function execution: `RETF`, `JUMPF`, or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"separate EIP" again, the following ones are OK to not have it, but at least once for the Spec section is good to have

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should RETURNCONTRACT mention have the same then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh boy, I've missed RETURNCONTRACT popping up here... my top-level opinion doesn't really change (should not mention at all), but if we mention, we should consistent in how we mention :(. We cannot leave the reader puzzled whether RETURNCONTRACT is something in legacy EVM, a typo, copy-pasta or leftovers from prev revision or whatnot.

But once more, I really think these EIPs should be as independent as possible, rather than trying to be the megaspec and cover all aspects in each thoroughly

- ending whole EVM execution: `STOP`, `RETURN`, `RETURNCONTRACT`, `REVERT`, `INVALID`.

For each reachable instruction in the code the operand stack height is recorded.
The first instruction has the recorded stack height equal to the number of inputs to the function type matching the code (`type[code_section_index].inputs`).
*note: `JUMPF` and `RETURNCONTRACT` are introduced in separate EIPs.*

The FIFO queue *worklist* is used for tracking "to be inspected" instructions. Initially, it contains the first instruction.
*Forward jump* refers to any of `RJUMP`/`RJUMPI`/`RJUMPV` instruction with relative offset greater than or equal to 0. *Backwards jump* refers to any of `RJUMP`/`RJUMPI`/`RJUMPV` instruction with relative offset less than 0, including jumps to the same jump instruction.

While *worklist* is not empty, dequeue an instruction and:

1. Determine the effect the instruction has on the operand stack:
1. **Check** if the recorded stack height satisfies the instruction requirements. Specifically:
- for `CALLF` instruction the recorded stack height must be at least the number of inputs of the called function according to its type defined in the type section.
- for `RETF` instruction the recorded stack height must be exactly the number of outputs of the function matching the code.
2. Compute new stack height after the instruction execution.
2. Determine the list of successor instructions that can follow the current instructions:
Instructions in the code are scanned in a single linear pass over the code. For each instruction the operand stack height bounds are recorded as `stack_height_min` and `stack_height_max`.

The first instruction's recorded stack height bounds are initialized to be equal to the number of inputs to the function type matching the code (`stack_height_min = stack_height_max = type[code_section_index].inputs`).

For each instruction:

1. **Check** if this instruction has recorded stack height bounds. If it does not, it means it was neither referenced by previous forward jump, nor is part of sequential instruction flow, and this code fails validation.

- It is a prerequisite to validation algorithm, and code generators are required to order code basic blocks in a way that no block is referenced only by backwards jump.

2. Determine the effect the instruction has on the operand stack:
1. **Check** if the recorded stack height bounds satisfy the instruction requirements. Specifically:
- for `CALLF` instruction the recorded stack height lower bound must be at least the number of inputs of the called function according to its type defined in the type section,
- for `RETF` instruction both the recorded lower and upper bound must be equal and must be exactly the number of outputs of the function matching the code,
- for `JUMPF` into returning function both the recorded lower and upper bound must equal exactly `type[current_section_index].outputs + type[target_section_index].inputs - type[target_section_index].outputs`,
- for `JUMPF` into non-returning function the recorded stack height lower bound must be at least the number of inputs of the target function according to its type defined in the type section,
- for any other instruction the recodrded stack height lower bound must be at least the number of inputs required by instruction,
- there is no additional check for terminating instructions other than `RETF` and `JUMPF`, this implies that extra items left on stack at instruction ending EVM execution are allowed.
2. For `CALLF` and `JUMPF` **check** for possible stack overflow: if recorded stack height upper bound is greater than `1024 - types[target_section_index].max_stack_height + types[target_section_index].inputs`, validation fails.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be inside 2.1 check list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.1 are the checks that guarantee no stack underflow,
2.2 is the check that prevents (some) stack overflows.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

3. Compute new stack height bounds after the instruction execution. Upper and lower bound are updated by the same value:
- after `CALLF` stack height bounds are adjusted by adding `types[target_section_index].outputs - types[target_section_index].inputs`,
- after any other non-terminating instruction stack height bounds are adjusted by subtracting the number of instruction inputs and adding the number of instruction outputs,
- terminating instructions do not need to update stack height bounds.
3. Determine the list of successor instructions that can follow the current instructions:
1. The next instruction for all instructions other than terminating instructions and unconditional jump.
2. All targets of a conditional or unconditional jump.
3. For each successor instruction:
4. For each successor instruction:
1. **Check** if the instruction is present in the code (i.e. execution must not "fall off" the code).
2. If the instruction does not have stack height recorded (visited for the first time):
1. Record the instruction stack height as the value computed in 1.2.
2. Add the instruction to the *worklist*.
3. Otherwise, **check** if the recorded stack height equals the value computed in 1.2.
2. If the successor is reached via forwards jump or sequential flow from previous instruction:
1. If the instruction does not have stack height bounds recorded (visited for the first time), record the instruction stack height bound as the value computed in 2.3.
2. Otherwise instruction was already visited (by previously seen forward jump). Update this instruction's recorded stack height bounds so that they contain the bounds computed in 2.3, i.e. `target_stack_min = min(target_stack_min, current_stack_min)` and `target_stack_max = max(target_stack_max, current_stack_min)`, where `(target_stack_min, target_stack_max)` are successor bounds and `(current_stack_min, current_stack_max)` are bounds computed in 2.3.
3. If the successor is reached via backwards jump, **check** if the recorded stack height bounds equal the value computed in 2.3. Validation fails if they are not equal, i.e. we see backwards jump to a different stack height.

When *worklist* is empty:
After all instructions are visited, determine the function maximum operand stack height:

1. **Check** if all instruction were reached by the analysis.
2. Determine the function maximum operand stack height:
1. Compute the maximum stack height as the maximum of all recorded stack heights.
1. Compute the maximum stack height as the maximum of all recorded stack height upper bounds.
2. **Check** if the maximum stack height does not exceed the limit of 1023 (`0x3FF`).
3. **Check** if the maximum stack height matches the corresponding code section's `max_stack_height` within the type section.

The computational and space complexity of this pass is *O(len(code))*. Each instruction is visited at most once.

### Execution

Given the deploy-time validation guarantees, an EVM implementation is not required anymore to have run-time stack underflow nor overflow checks for each executed instruction. The exception is the `CALLF` performing operand stack overflow check for the entire called function.
Given the deploy-time validation guarantees, an EVM implementation is not required anymore to have run-time stack underflow nor overflow checks for each executed instruction. The exception is the `CALLF` and `JUMPF` performing operand stack overflow check for the entire called function.

## Rationale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we're done with the specifications part, needs adding justification of going down the relaxed validation path

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in Rationale.


### Stack overflow check only in CALLF
### Properties of validated code

Any code section validated according to operand stack validation has the following properties:

1. There are no unreachable instructions
2. There are no instructions reachable only via backwards jump
3. Operand stack underflow cannot happen.
4. Operand stack overflow can only happen at `CALLF` or `JUMPF` instruction.
5. Multiple forward jump instructions executing at different stack heights may target the same instruction; the stack of target basic block is validated for all possible heights.
6. Any backwards jump instruction can only target an instruction that is executed with equal stack height; this prevents deployment of the loops with unbounded stack pushing or popping.
7. Final instruction in the code section is either terminating instruction or `RJUMP`.

### Stack overflow check only in CALLF/JUMPF

In this EIP, we provide a more efficient variant of the EVM where stack overflow check is performed only in `CALLF` instruction using the called function's `max_stack_height` information. This decreases flexibility of an EVM program because `max_stack_height` corresponds to the worst-case control-flow path in the function.
In this EIP, we provide a more efficient variant of the EVM where stack overflow check is performed only in `CALLF` and `JUMPF` instructions using the called function's `max_stack_height` information. This decreases flexibility of an EVM program because `max_stack_height` corresponds to the worst-case control-flow path in the function.

### Unreachable code

Expand All @@ -114,6 +140,20 @@ Otherwise, the `RETF` semantic would be more complicated. For `n` function outpu
However, lifting the requirement and modifying the `RETF` semantic as described above is backward
compatible and can be easily introduced in the future.

### More restrictive stack validation

Originally another variant of stack validation was proposed, where instead of linear scan of the code section, all code paths were examined by following the target(s) of every jump instruction in a breadth-first-search manner, tracking stack height for each visited instruction and checking that for every possible code path to a particular instruction its stack height remains constant.

The advantage of this variant would be somewhat simpler algorithm (we would not need to track stack height bounds, but only a single stack height value for each instruction) and no extra requirement for ordering of code basic blocks (see below).

However compiler teams opposed to such restrictive stack height requirements. One prominent pattern used by compilers which wouldn't be possible is jumping to terminating helpers (code blocks ending with `RETURN` or `REVERT`) from different stack heights. This is common for example for a series of `assert` statements, each one compiled to a `RJUMPI` into a shared terminating helper. Enforcing constant stack requirement would mean that before jumping to such helper, extra items on the stack have to be popped, and this noticably increases code size and consumed gas, and would defeat the purpose of extracting these common terminating sequences into a helper.

### Ordering of basic blocks

The prerequisite to stack validation algorithm is ordering of code basic blocks in a way that no block is referenced only by backwards jump.

This is required to make it possible to examine each instruction in one linear pass over the code section. Forward pass over the code section allows for the algorithm to "expand" each forward jump target's stack height bounds and still keep the complexity linear. Trying to do jump target stack bounds expansion while scanning the code in the breadth-first-search manner would require to re-examine entire code path after its stack height bounds are expanded, which would result in quadratic complexity.

## Backwards Compatibility

This change requires a "network upgrade", since it modifies consensus rules.
Expand Down
Loading