Investigate canonical encoding scheme #35

adlerjohn · 2020-06-11T17:50:58Z

Currently, a canonical variant of protobuf3 is used to define data structures and do serialization/deserialization. This is great for portability, isn't ideal for blockchain applications. For example, there are no fixed-size byte arrays (32-bytes hashes used all over the place in blockchains), or optional fields. Robust protobuf implementations for embedded devices (e.g. hardware wallets) or blockchain smart contracts (e.g. Solidity) are non-existent and would be prohibitive to develop.

The crux is that protobuf was designed for client-server communication, with different versions (i.e. both forwards and backwards compatibility are key features). This is unneeded for core blockchain data structures (e.g. blocks, votes, or transactions), but may be good for node-to-node communication (e.g. messages that wrap around block data, vote data, or transaction data).

We should investigate the feasibility of using a simpler serialization scheme for core data structures.

Desiderata:

fully deterministic (specifically, bijective)
binary, not text
native support for basic blockchain data types (esp. fixed-sized arrays)
typedefs / type aliases (i.e. zero-cost abstractions)
no requirements on backwards or forwards compatibility

Comparison of Difference Schemes

Protobuf

https://developers.google.com/protocol-buffers/docs/overview

https://github.com/lazyledger/protobuf3-solidity-lib

https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-027-deterministic-protobuf-serialization.md

cosmos/cosmos-sdk#7488

protocolbuffers/protobuf#3521

XDR

https://tools.ietf.org/html/rfc4506

https://developers.stellar.org/docs/glossary/xdr

Veriform

https://github.com/iqlusioninc/veriform

SimpleSerialize (SSZ)

https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/simple-serialize.md

┆Issue is synchronized with this Asana task by Unito

liamsi · 2020-06-24T09:07:16Z

One such scheme is Veriform, which can compute a canonical commitment to some data

My understanding is that the hashing can easily be made optional there. Other than that it (the encoding part of veriform) shares a lot with the vanilla protobuf encoding. The only changes I can see are: restricting it to fewer types (and specifying string encoding) to achieve determinism. Additionally, it squeezes in a critical bit to indicate that a field is not allowed to be missing and uses a different varint encoding. It actually tries to enable backwards (and forward?) compatibility or "schema evolution". The main motivation for not using protobuf and developing veriform instead seems to be the lack of a (rust) implementation that can be used in a heavily restricted environment.

I feel like I should try to ignore all concerns I share with @adlerjohn and try writing a defence for just using protobuf with additional rules some time soon.

I think protobuf with additional rules (almost) ticks all boxes of:

Desiderata:

fully deterministic

binary, not text

native support for basic blockchain data types (esp. fixed-sized arrays)

typedefs / type aliases

no requirements no backwards or forwards compatibility

BTW, while typedefs are being really useful in the spec, I don't see why they should be a hard requirement for the serialzation format. Regarding the fixed sized arrays, I feel like checking the length could be done by the core-types and not by the generated proto types.

Wondertan · 2021-05-28T20:18:35Z

Worth considering: https://google.github.io/flatbuffers/

liamsi · 2021-05-28T20:31:12Z

Also related: cosmos/cosmos-sdk#5444

liamsi · 2021-05-28T20:33:42Z

Also a must-read:

Also, this is by the guy who brought you protobufv2 and Cap'nProto: https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html

liamsi · 2021-12-08T18:53:53Z

I think this is the wrong focus as there are more urgent and relevant discussion. That said, IF we want to do anything about the canonical encoding here, then we should do it before incentivized testnet.

Note that this is not about writing specs now but about making and documenting a decision. We can also move this issue into the appropriate repositories where this can be captured in an ADR close to the code this might touch.

adlerjohn · 2021-12-10T00:45:29Z

I don't think there's really much left for debate in that we're going to use deterministic protobuf? Maybe this issue can be closed?

liamsi · 2021-12-12T20:17:35Z

Let's close then. For reference BTW: tendermint/tendermint#7427

adlerjohn added enhancement New feature or request investigation Investigation required. labels Jun 11, 2020

adlerjohn added this to the Spec 1.0.0-alpha milestone Jun 11, 2020

adlerjohn added documentation Improvements or additions to documentation serialization Serialization definitions labels Jun 11, 2020

adlerjohn changed the title ~~Investigate binary encoding~~ Investigate canonical encoding scheme Jun 18, 2020

adlerjohn mentioned this issue Jun 22, 2020

Remove protobuf use for core data structures #38

Merged

adlerjohn self-assigned this Jan 22, 2021

liamsi mentioned this issue Mar 31, 2021

Reconsider serializing block data into shares #152

Closed

liamsi unassigned adlerjohn May 28, 2021

liamsi added this to Celestia Node Dec 8, 2021

liamsi closed this as completed Dec 12, 2021

liamsi moved this to Done in Celestia Node Dec 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate canonical encoding scheme #35

Investigate canonical encoding scheme #35

adlerjohn commented Jun 11, 2020 •

edited by sync-by-unito bot

Loading

liamsi commented Jun 24, 2020 •

edited

Loading

Wondertan commented May 28, 2021

liamsi commented May 28, 2021

liamsi commented May 28, 2021 •

edited

Loading

liamsi commented Dec 8, 2021

adlerjohn commented Dec 10, 2021

liamsi commented Dec 12, 2021

Investigate canonical encoding scheme #35

Investigate canonical encoding scheme #35

Comments

adlerjohn commented Jun 11, 2020 • edited by sync-by-unito bot Loading

Comparison of Difference Schemes

Protobuf

XDR

Veriform

SimpleSerialize (SSZ)

liamsi commented Jun 24, 2020 • edited Loading

Wondertan commented May 28, 2021

liamsi commented May 28, 2021

liamsi commented May 28, 2021 • edited Loading

liamsi commented Dec 8, 2021

adlerjohn commented Dec 10, 2021

liamsi commented Dec 12, 2021

adlerjohn commented Jun 11, 2020 •

edited by sync-by-unito bot

Loading

liamsi commented Jun 24, 2020 •

edited

Loading

liamsi commented May 28, 2021 •

edited

Loading