VRFs - what are they and how do they work in Symbol
This article is split into three parts:
- background on HMACs - keyed-hash message authentication codes, if you're familiar with the topic, you can skip to next part
- VRFs - Verifiable Random Functions
- VRFs in Symbol
Reader is expected to know following basic crypto terms:
- (cryptographic) hash function
- public key cryptography
- shared secret i.e. using Diffie–Hellman key exchange
HMAC is a general construction that allows turning any hash function into keyed hashing function that can be used for message authentication. That all sounds fancy, so time to split it into digestible pieces1:
-
keyed hashing function means that the function takes two arguments, one is a key and since the function does hashing the other argument is a message, therefore $ \text{HMAC}(\textit{key}, \textit{message}) $
-
message authentication means that:
- message has not been modified in transport and
- receiver can verify source of the message
Wait, receiver, where did he come from? Glad you ask.
- HMAC is not used on its own it's appended to a message send between two parties.
- The key isn't some random key, it's a shared secret, that is known to both parties (and only if the secret is known only to both parties, hmac provides message authentication).
Trivia:
authors of HMAC are Bellare, Canetti, Krawczyk.
Bellare is the same Bellare, that - among other things - is known for Bellare-Miner forward secure signature scheme. (joint work with Miner).
Krawczyk is known - among other things - for his work on ZKPs: On the composition of zero-knowledge proof systems (joint work with Goldreich).
In case of HMAC only parties that share a secret can calculate hmac value.
VRF is also keyed hash but based on public-key cryptography. What this means is that:
- only private key owner can calculate VRF hash.
- everyone who has public key can validate such hash.
Owner of the private key is usually called a prover. Besides VRF hash, prover provides a proof, that the hash is valid.
Before diving in, it's crucial to understand the elements.
Prover calculates:
proof = VRF_prove(secret_key, message)
hash = VRF_hashing(proof)
Note, that prover needs to publish both proof and hash.
Verifier can validate the hash:
VRF_verify(public_key, message, proof)
If successful, VRF_verify
will return hash.
VRFs have some interesting properties.
- proof uniqueness - even when having ability to set public key and message the attacker cannot create two different proofs, for which call to VRF_verify returns success and returned hashes are different2.
- collision resistance - even when having ability to select secret key the adverary is not able to create two different messages, which would produce same hash2.
- pseudorandomness (in the sense of PRF)
- random-oracle-like unpredictability - if the message cannot be predicted then the hash is indistinguishable from uniform - this property is used in Symbol, I will get back to that in last part.
There is Internet Research Task Force (IRTF), that is working on standarizing VRF with various parameters. Construction is very generic, one of the reasons is that there are multiple public-key cryptosystems that can be used. Two biggest "groups" are:
- RSA based VRFs, and
- elliptic curve based VRFs
Here, I will focus only on EC-based variant, as that is what we use in Symbol.
I'll try not to go into details.
Both prover and verifier know:
B
- ed25519 base point- prover's public key -
Y = x*B
(or will refer to Y simply as public key) H
- point on an elliptic curve derived from prover's public key and message
Prover knows secret x
(secret scalar derived from his secret key) and calculates:
Gamma = x*H
- since H is a point and x is a scalar, Gamma is also a pointk
- nonce (random value) derived using hash function from secret key and H (therefore konwn only to prover)c
- magic coefficient derived from Gamma,k*B
andk*H
- also called "verification hash"s = (k + c*x) mod q
- "s" for scalar
The proof is a triplet: (Gamma, c, s)
Last part, might look easy, however:
q
is a prime number (>250 bits prime)- calculating
x
out of Gamma would require solving DLP on elliptic curve - finding out value of
k
is equivalent to finding pre-image of used hash function
Therefore proof cannot be forged without knowledge of x
.
After proof is calculated, VRF_hashing()
is basically hashing a Gamma part of a proof triplet.
Now more interesting part - the verifier. There are some tech peculiarities, but ultimately, verifier gets triplet (Gamma, c, s) and calculates:
U = s*B - c*Y
=(k*B + c*x*B) - c*Y
=(k*B + c*Y) - c*Y
=k*B
V = s*H - c*Gamma
=(k*H + c*x*H) - c*Gamma
=(k*H + c*Gamma) - c*Gamma
=k*H
Now verifier has Gamma, k*B
and k*H
, therefore verifier can calculate c'
the same way that prover calculates c
.
If c' matches c provided in a proof, proof is valid and verifier can calculate VRF hash by running VRF_hashing
on whole proof.
Note: all of this might look a bit like a high school math, in reality however things are slightly bit more complicated. Part of the calculations happen on elliptic curve and part in a finite field.
I've skimmed a bit over some details, but I can't continue without them.
H
in previous section is calculated using ECVRF_hash_to_curve
function. Hash to curve3 is important part of crypto, and I don't
think it's an overstatement if I say that it has become a discipline of its own (especially in the world of ZK-crypto).
Currently symbol uses so called ECVRF_hash_to_curve_try_and_increment
method, however, the implementation currently used is based on previous
version of a draft (v6) and is missing additional zero byte. We will address that in near future.
Full name of the VRF implementation used in Symbol is ECVRF-EDWARDS25519-SHA512-TAI
(TAI comes from try and increment)
Every harvester in Symbol has associated VRF public key (using vrf key link transaction).
Half of a VRF hash (hashed Gamma) is used as what we call generation hash. As mentioned earlier, VRF hash can act like random-oracle if message is unpredictable.
In case of Symbol, message passed to VRF calculation is generation hash of a previous block.
// Harvester.cpp
unlockedAccountsView.forEach([&context, &hitContext, &hitPredicate, &pHarvesterKeyPair, &vrfProof](const auto& descriptor) {
hitContext.Signer = descriptor.signingKeyPair().publicKey();
vrfProof = crypto::GenerateVrfProof(context.ParentContext.GenerationHash, descriptor.vrfKeyPair());
First vrf proof was generated by nemesis account, message passed to VRF calculation of the first block (genesis block) is called generation hash seed it is specified via entry in config-network.properties
.4
Without VRF (i.e. in NEM) and with perfect information, it is possible to guess who will be harvester of a next block - at height H
.
If generation hash is dependent solely on harvester's public key, the attacker that could predict harvester of a block at height H
could also calculate generation hash that block and then premine block at height H+1
.
With VRF in place, it's no longer possible to predict harvester, because hit value depends on generation hash.
You can find whole VRF draft in IETF datatracker: https://datatracker.ietf.org/doc/draft-irtf-cfrg-vrf/10/
VRF Trivia:
all authors involved in VRF draft are involved in DNSSEC improvements: Goldberg, Papadopoulos, Reyzin, Vcelak
here you can find more information about NSCE5
EC-VRFs are fundamental piece of nsec5 redesign proposal, more details in Making NSEC5 Practical for DNSSEC paper
<style class="fallback">body{visibility:hidden}</style><script>markdeepOptions={tocStyle:'long'};</script> <script src="./markdeep.min.js" charset="utf-8"></script>
Footnotes
-
HMAC wasn't first attempt to create MAC around hashing function, but it has one more feature that beat competition: construction isn't susceptible to length extension attacks ↩
-
generation hash seed in Symbol is a sha3-256 hash of the same quote that was used as a base for generation hash in NEM. ↩