Skip to content
This repository has been archived by the owner on Mar 24, 2023. It is now read-only.

Investigate canonical encoding scheme #35

Closed
adlerjohn opened this issue Jun 11, 2020 · 7 comments
Closed

Investigate canonical encoding scheme #35

adlerjohn opened this issue Jun 11, 2020 · 7 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request investigation Investigation required. serialization Serialization definitions

Comments

@adlerjohn
Copy link
Member

adlerjohn commented Jun 11, 2020

Currently, a canonical variant of protobuf3 is used to define data structures and do serialization/deserialization. This is great for portability, isn't ideal for blockchain applications. For example, there are no fixed-size byte arrays (32-bytes hashes used all over the place in blockchains), or optional fields. Robust protobuf implementations for embedded devices (e.g. hardware wallets) or blockchain smart contracts (e.g. Solidity) are non-existent and would be prohibitive to develop.

The crux is that protobuf was designed for client-server communication, with different versions (i.e. both forwards and backwards compatibility are key features). This is unneeded for core blockchain data structures (e.g. blocks, votes, or transactions), but may be good for node-to-node communication (e.g. messages that wrap around block data, vote data, or transaction data).

We should investigate the feasibility of using a simpler serialization scheme for core data structures.

Desiderata:

  • fully deterministic (specifically, bijective)
  • binary, not text
  • native support for basic blockchain data types (esp. fixed-sized arrays)
  • typedefs / type aliases (i.e. zero-cost abstractions)
  • no requirements on backwards or forwards compatibility

Comparison of Difference Schemes

Protobuf

https://developers.google.com/protocol-buffers/docs/overview

https://github.com/lazyledger/protobuf3-solidity-lib

https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-027-deterministic-protobuf-serialization.md

cosmos/cosmos-sdk#7488

protocolbuffers/protobuf#3521

XDR

https://tools.ietf.org/html/rfc4506

https://developers.stellar.org/docs/glossary/xdr

Veriform

https://github.com/iqlusioninc/veriform

SimpleSerialize (SSZ)

https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/simple-serialize.md

┆Issue is synchronized with this Asana task by Unito

@adlerjohn adlerjohn added enhancement New feature or request investigation Investigation required. labels Jun 11, 2020
@adlerjohn adlerjohn added this to the Spec 1.0.0-alpha milestone Jun 11, 2020
@adlerjohn adlerjohn added documentation Improvements or additions to documentation serialization Serialization definitions labels Jun 11, 2020
@adlerjohn adlerjohn changed the title Investigate binary encoding Investigate canonical encoding scheme Jun 18, 2020
@liamsi
Copy link
Member

liamsi commented Jun 24, 2020

One such scheme is Veriform, which can compute a canonical commitment to some data

My understanding is that the hashing can easily be made optional there. Other than that it (the encoding part of veriform) shares a lot with the vanilla protobuf encoding. The only changes I can see are: restricting it to fewer types (and specifying string encoding) to achieve determinism. Additionally, it squeezes in a critical bit to indicate that a field is not allowed to be missing and uses a different varint encoding. It actually tries to enable backwards (and forward?) compatibility or "schema evolution". The main motivation for not using protobuf and developing veriform instead seems to be the lack of a (rust) implementation that can be used in a heavily restricted environment.

I feel like I should try to ignore all concerns I share with @adlerjohn and try writing a defence for just using protobuf with additional rules some time soon.

I think protobuf with additional rules (almost) ticks all boxes of:

Desiderata:

  • fully deterministic
  • binary, not text
  • native support for basic blockchain data types (esp. fixed-sized arrays)
  • typedefs / type aliases
  • no requirements no backwards or forwards compatibility

BTW, while typedefs are being really useful in the spec, I don't see why they should be a hard requirement for the serialzation format. Regarding the fixed sized arrays, I feel like checking the length could be done by the core-types and not by the generated proto types.

@Wondertan
Copy link
Member

Worth considering: https://google.github.io/flatbuffers/

@liamsi
Copy link
Member

liamsi commented May 28, 2021

Also related: cosmos/cosmos-sdk#5444

@liamsi
Copy link
Member

liamsi commented May 28, 2021

@liamsi
Copy link
Member

liamsi commented Dec 8, 2021

I think this is the wrong focus as there are more urgent and relevant discussion. That said, IF we want to do anything about the canonical encoding here, then we should do it before incentivized testnet.

Note that this is not about writing specs now but about making and documenting a decision. We can also move this issue into the appropriate repositories where this can be captured in an ADR close to the code this might touch.

@adlerjohn
Copy link
Member Author

I don't think there's really much left for debate in that we're going to use deterministic protobuf? Maybe this issue can be closed?

@liamsi
Copy link
Member

liamsi commented Dec 12, 2021

Let's close then. For reference BTW: tendermint/tendermint#7427

@liamsi liamsi closed this as completed Dec 12, 2021
@liamsi liamsi moved this to Done in Celestia Node Dec 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation enhancement New feature or request investigation Investigation required. serialization Serialization definitions
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants