Skip to content

Commit

Permalink
Add EIP: Separate Metadata Section for EOF
Browse files Browse the repository at this point in the history
Merged by EIP-Bot.
  • Loading branch information
kuzdogan authored Dec 17, 2024
1 parent 943a192 commit 821ffeb
Showing 1 changed file with 92 additions and 0 deletions.
92 changes: 92 additions & 0 deletions EIPS/eip-7834.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
eip: 7834
title: Separate Metadata Section for EOF
description: Introduce a new separate metadata section to the EOF
author: Kaan Uzdogan (@kuzdogan), Marco Castignoli (@marcocastignoli), Manuel Wedler (@manuelwedler)
discussions-to: https://ethereum-magicians.org/t/eip-7834-separate-metadata-section-for-eof/22138
status: Draft
type: Standards Track
category: Core
created: 2024-12-06
requires: 3540
---

## Abstract

Introduce a new separate metadata section to the Ethereum Object Format (EOF) that is unreachable by the code, and any changes to which does not affect the code.

## Motivation

It is desirable to include metadata in contract's bytecode for various reasons. For instance, both the Solidity and Vyper compilers by default include the language and compiler version used to compile. Vyper (with 0.4.1) appends an integrity hash to the initcode in CBOR encoding. Solidity additionally includes the IPFS or the Swarm hash of the Solidity contract metadata.json file, and the experimental Solidity flag. The current (pre-EOF) practice is to append this CBOR encoded metadata section in the contract's runtime bytecode, followed by the 2 bytes length of the CBOR encoded bytes.

```
Solidity ┌──────────────────────────────────────────0x0033 bytes──────────────────────────────────────────────┐
...7265206c656e677468a2646970667358221220dceca8706b29e917dacf25fceef95acac8d90d765ac926663ce4096195952b6164736f6c634300060b0033
```

This poses a problem for source code verification where the onchain bytecode is compared to the compiled bytecode of the given source code. During a contract verification, metadata sections, in particular the IPFS hash, need to be ignored and only the executional bytecode should be compared. Since pre-EOF bytecode is not structured, it is not possible to distinguish the metadata section from the executional bytecode easily. This gets even trickier in the case of factory contracts with multiple nested bytecodes, each having their own metadata sections. Verifiers need to implement their own heuristics and workarounds to find the metadata sections and ignore it.

The EOF brings structure to the bytecode by separating the code from the data, and placing the code of each contract in their respective containers. In its current form, this makes it possible to find the data easier than the pre-EOF bytecode. However, the current spec also does not describe a metadata section. Compilers currently need to place the contract metadata inside the data section which poses several problems:

1. It is not straightforward to distinguish the metadata part in the `data_section`, which poses the same problem as the pre-EOF bytecode.
2. Any change to the metadata's size within the data section will change the executional bytecode, e.g. through shifting `DATALOADN` offsets. With that, two identical contracts with different metadata sizes will not match during source code, since the code will be different.
3. The metadata can theoretically be reached by the code, e.g. via manipulating the `DATALOADN` instructions.

## Specification

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174.

Extending the format introduced in [EIP-3540](./eip-3540.md), this EIP proposes to add a new OPTIONAL section in the body called `metadata_section` before the `data_section`, and to add two new OPTIONAL fields `kind_metadata` (value: `0x05`) and `metadata_size` to the header before the `kind_data` and `data_size` fields.

```
container := header, body
header :=
magic, version,
kind_type, type_size,
kind_code, num_code_sections, code_size+,
[kind_container, num_container_sections, container_size+,]
[kind_metadata, metadata_size,]
kind_data, data_size,
terminator
body := types_section, code_section+, container_section*, [metadata_section], data_section
types_section := (inputs, outputs, max_stack_height)+
```

### Header

| name | length | value | description |
| ------------- | ------- | ------------- | --------------------------------------------------------------------------------------- |
| ... | ... | ... | ... |
| kind_metadata | 1 byte | 0x05 | kind marker for metadata size section |
| metadata_size | 2 bytes | 0x0001-0xFFFF | 16-bit unsigned big-endian integer denoting the length of the metadata section content |
| kind_data | 1 byte | 0x04 | kind marker for data size section |
| data_size | 2 bytes | 0x0000-0xFFFF | 16-bit unsigned big-endian integer denoting the length of the data section content (\*) |
| terminator | 1 byte | 0x00 | marks the end of the header |

### Body

| name | length | value | description |
| ---------------- | -------- | ----- | --------------------------- |
| ... | ... | ... | ... |
| metadata_section | variable | n/a | arbitrary sequence of bytes |
| data_section | variable | n/a | arbitrary sequence of bytes |

The strucure and the encoding of the `metadata_section` is not defined by this EIP. It is left to the compilers, tooling, or the contract developers to define the encoding and the content. The current practice by the Solidity and Vyper compilers is to use CBOR encoding.

## Rationale

The `metadata_section` in the `body`, as well as the `kind_metadata` and `metadata_size` fields in the `header`, are OPTIONAL. This way, the compilers can avoid additional bytes in the container if they don't want to write any metadata. The `data_section` can change in its size and content during deployment, therefore it needs to be REQUIRED, even if the data is empty. The `metadata_section` is not expected to change during the deployment.

The reason for placing the `metadata_section` before the `data_section`, and assigning `kind_metadata` the value `0x05` (and not `0x04`) is to make it easier for the existing EOF tooling adapt the changes. Additionally, if the `metadata_section` was placed after the `data_section`, changes to the `data_section` in deploy time would cause the `metadata_section` to shift. By placing the `metadata_section` before, this could be mitigated.

## Backwards Compatibility

No backward compatibility issues are expected since [EIP-3540](./eip-3540.md) is not implemented yet.

## Security Considerations

No security considerations as this section is meant not to be executed.

## Copyright

Copyright and related rights waived via [CC0](../LICENSE.md).

0 comments on commit 821ffeb

Please sign in to comment.