Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: de-duplicate payloads from persisted beacon blocks #6029

Closed
wants to merge 52 commits into from

Conversation

matthewkeil
Copy link
Member

@matthewkeil matthewkeil commented Oct 9, 2023

NOTE: The Sim Merge Test is not going to pass. The container that it runs one test in needs to be updated. @g11tech is going to look for the Dockerfile and I will help get it updated and published so it will pass. The image is based on a pre-shanghai image that does not have engine_getPayloadBodiesByHashV1 available. This is the image:
https://hub.docker.com/r/g11tech/mergemock

Two things still need to be double checked before moving to ready:

  • - double check that getBlock works as expected
  • - get valid deneb block (ask @g11tech how to generate with valid data. perhaps can pull from devnet 9??)
  • - turn on deneb block unit tests. need to add a value for the fork epoch and spoof valid slots in the mocks
  • - convert the fixtures to .ssz format to reduce the diff
  • - proof on a benchmark the need for serialized conversion in packages/beacon-node/src/util/fullOrBlindedBlock.ts
  • - convert generator serialized conversion to promise and re-test perf as promise
  • - remove excess codepath from results above
  • - Fix sim-test eth1 engine mock to support engine_getPayloadBodiesByHashV1

Motivation

Lodestar is saving data that is also saved in the execution client database. In particular we are persisting transactions and withdrawals in the block and blockArchive databases.

Description

Stores blinded blocks in both the hot and cold db. Modifies calls for data retrieval that require the full block, ReqResp and API, to splice in the missing transactions and withdrawals.

Closes #5671

** How to test **

Extensive unit and perf testing was conducted to make sure that this should work correctly.

yarn test:unit
yarn benchmark:files packages/beacon-node/test/perf/util/fullOrBlindedBlock.test.ts

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: d785597 Previous: d101913 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 943.84 us/op 632.42 us/op 1.49
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 45.049 us/op 50.915 us/op 0.88
BLS verify - blst-native 1.2082 ms/op 1.2338 ms/op 0.98
BLS verifyMultipleSignatures 3 - blst-native 2.5710 ms/op 2.6268 ms/op 0.98
BLS verifyMultipleSignatures 8 - blst-native 5.6732 ms/op 5.8084 ms/op 0.98
BLS verifyMultipleSignatures 32 - blst-native 20.863 ms/op 21.383 ms/op 0.98
BLS verifyMultipleSignatures 64 - blst-native 41.062 ms/op 41.880 ms/op 0.98
BLS verifyMultipleSignatures 128 - blst-native 81.492 ms/op 82.789 ms/op 0.98
BLS deserializing 10000 signatures 841.64 ms/op 857.65 ms/op 0.98
BLS deserializing 100000 signatures 8.9130 s/op 8.6906 s/op 1.03
BLS verifyMultipleSignatures - same message - 3 - blst-native 1.2758 ms/op 1.2706 ms/op 1.00
BLS verifyMultipleSignatures - same message - 8 - blst-native 1.5539 ms/op 1.4656 ms/op 1.06
BLS verifyMultipleSignatures - same message - 32 - blst-native 2.4270 ms/op 2.2055 ms/op 1.10
BLS verifyMultipleSignatures - same message - 64 - blst-native 3.6527 ms/op 3.5955 ms/op 1.02
BLS verifyMultipleSignatures - same message - 128 - blst-native 6.4698 ms/op 5.4965 ms/op 1.18
BLS aggregatePubkeys 32 - blst-native 27.294 us/op 25.631 us/op 1.06
BLS aggregatePubkeys 128 - blst-native 100.99 us/op 99.743 us/op 1.01
notSeenSlots=1 numMissedVotes=1 numBadVotes=10 80.532 ms/op 56.216 ms/op 1.43
notSeenSlots=1 numMissedVotes=0 numBadVotes=4 82.047 ms/op 60.362 ms/op 1.36
notSeenSlots=2 numMissedVotes=1 numBadVotes=10 37.605 ms/op 34.134 ms/op 1.10
getSlashingsAndExits - default max 92.231 us/op 100.05 us/op 0.92
getSlashingsAndExits - 2k 282.82 us/op 321.21 us/op 0.88
proposeBlockBody type=full, size=empty 5.6370 ms/op 5.8358 ms/op 0.97
isKnown best case - 1 super set check 301.00 ns/op 292.00 ns/op 1.03
isKnown normal case - 2 super set checks 286.00 ns/op 287.00 ns/op 1.00
isKnown worse case - 16 super set checks 279.00 ns/op 284.00 ns/op 0.98
InMemoryCheckpointStateCache - add get delete 4.3430 us/op 5.2560 us/op 0.83
validate api signedAggregateAndProof - struct 2.6131 ms/op 2.6488 ms/op 0.99
validate gossip signedAggregateAndProof - struct 2.6111 ms/op 2.6483 ms/op 0.99
validate gossip attestation - vc 640000 1.2525 ms/op 1.2889 ms/op 0.97
batch validate gossip attestation - vc 640000 - chunk 32 148.19 us/op 153.61 us/op 0.96
batch validate gossip attestation - vc 640000 - chunk 64 131.60 us/op 135.74 us/op 0.97
batch validate gossip attestation - vc 640000 - chunk 128 121.66 us/op 125.75 us/op 0.97
batch validate gossip attestation - vc 640000 - chunk 256 119.09 us/op 121.66 us/op 0.98
pickEth1Vote - no votes 1.1814 ms/op 1.1127 ms/op 1.06
pickEth1Vote - max votes 8.0572 ms/op 10.394 ms/op 0.78
pickEth1Vote - Eth1Data hashTreeRoot value x2048 13.389 ms/op 17.171 ms/op 0.78
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 19.460 ms/op 22.704 ms/op 0.86
pickEth1Vote - Eth1Data fastSerialize value x2048 549.66 us/op 497.75 us/op 1.10
pickEth1Vote - Eth1Data fastSerialize tree x2048 3.8498 ms/op 6.1644 ms/op 0.62
bytes32 toHexString 429.00 ns/op 452.00 ns/op 0.95
bytes32 Buffer.toString(hex) 260.00 ns/op 242.00 ns/op 1.07
bytes32 Buffer.toString(hex) from Uint8Array 398.00 ns/op 366.00 ns/op 1.09
bytes32 Buffer.toString(hex) + 0x 266.00 ns/op 246.00 ns/op 1.08
Object access 1 prop 0.13800 ns/op 0.14900 ns/op 0.93
Map access 1 prop 0.12900 ns/op 0.13500 ns/op 0.96
Object get x1000 6.3110 ns/op 6.0250 ns/op 1.05
Map get x1000 6.4870 ns/op 6.3990 ns/op 1.01
Object set x1000 35.504 ns/op 35.661 ns/op 1.00
Map set x1000 23.501 ns/op 25.018 ns/op 0.94
Return object 10000 times 0.29710 ns/op 0.29430 ns/op 1.01
Throw Error 10000 times 3.5486 us/op 3.3924 us/op 1.05
fastMsgIdFn sha256 / 200 bytes 2.2810 us/op 2.1990 us/op 1.04
fastMsgIdFn h32 xxhash / 200 bytes 229.00 ns/op 255.00 ns/op 0.90
fastMsgIdFn h64 xxhash / 200 bytes 269.00 ns/op 262.00 ns/op 1.03
fastMsgIdFn sha256 / 1000 bytes 7.3140 us/op 7.3030 us/op 1.00
fastMsgIdFn h32 xxhash / 1000 bytes 361.00 ns/op 404.00 ns/op 0.89
fastMsgIdFn h64 xxhash / 1000 bytes 365.00 ns/op 343.00 ns/op 1.06
fastMsgIdFn sha256 / 10000 bytes 65.812 us/op 63.949 us/op 1.03
fastMsgIdFn h32 xxhash / 10000 bytes 1.8820 us/op 1.9180 us/op 0.98
fastMsgIdFn h64 xxhash / 10000 bytes 1.2400 us/op 1.2380 us/op 1.00
send data - 1000 256B messages 12.593 ms/op 13.496 ms/op 0.93
send data - 1000 512B messages 17.344 ms/op 18.800 ms/op 0.92
send data - 1000 1024B messages 26.622 ms/op 28.123 ms/op 0.95
send data - 1000 1200B messages 28.342 ms/op 27.052 ms/op 1.05
send data - 1000 2048B messages 33.115 ms/op 32.031 ms/op 1.03
send data - 1000 4096B messages 28.918 ms/op 33.492 ms/op 0.86
send data - 1000 16384B messages 84.916 ms/op 71.270 ms/op 1.19
send data - 1000 65536B messages 227.90 ms/op 216.51 ms/op 1.05
enrSubnets - fastDeserialize 64 bits 1.2620 us/op 1.1870 us/op 1.06
enrSubnets - ssz BitVector 64 bits 470.00 ns/op 369.00 ns/op 1.27
enrSubnets - fastDeserialize 4 bits 198.00 ns/op 147.00 ns/op 1.35
enrSubnets - ssz BitVector 4 bits 474.00 ns/op 371.00 ns/op 1.28
prioritizePeers score -10:0 att 32-0.1 sync 2-0 185.80 us/op 153.55 us/op 1.21
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 208.38 us/op 152.67 us/op 1.36
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 335.97 us/op 299.42 us/op 1.12
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 494.58 us/op 404.97 us/op 1.22
prioritizePeers score 0:0 att 64-1 sync 4-1 891.77 us/op 669.98 us/op 1.33
array of 16000 items push then shift 1.7440 us/op 1.6326 us/op 1.07
LinkedList of 16000 items push then shift 8.0550 ns/op 7.4540 ns/op 1.08
array of 16000 items push then pop 137.73 ns/op 114.36 ns/op 1.20
LinkedList of 16000 items push then pop 7.6050 ns/op 7.2750 ns/op 1.05
array of 24000 items push then shift 2.5663 us/op 2.4287 us/op 1.06
LinkedList of 24000 items push then shift 8.4580 ns/op 7.5770 ns/op 1.12
array of 24000 items push then pop 179.72 ns/op 138.12 ns/op 1.30
LinkedList of 24000 items push then pop 8.1230 ns/op 7.1980 ns/op 1.13
intersect bitArray bitLen 8 6.9560 ns/op 6.4460 ns/op 1.08
intersect array and set length 8 64.743 ns/op 47.194 ns/op 1.37
intersect bitArray bitLen 128 30.757 ns/op 30.072 ns/op 1.02
intersect array and set length 128 802.75 ns/op 679.57 ns/op 1.18
bitArray.getTrueBitIndexes() bitLen 128 2.4550 us/op 1.7080 us/op 1.44
bitArray.getTrueBitIndexes() bitLen 248 4.2290 us/op 3.1610 us/op 1.34
bitArray.getTrueBitIndexes() bitLen 512 9.4870 us/op 7.0580 us/op 1.34
Buffer.concat 32 items 1.0650 us/op 909.00 ns/op 1.17
Uint8Array.set 32 items 1.9920 us/op 1.4770 us/op 1.35
Buffer.copy 2.1080 us/op 1.6880 us/op 1.25
Uint8Array.set - with subarray 3.6560 us/op 2.5010 us/op 1.46
Uint8Array.set - without subarray 2.1140 us/op 1.5920 us/op 1.33
Set add up to 64 items then delete first 2.6916 us/op 2.1820 us/op 1.23
OrderedSet add up to 64 items then delete first 3.8006 us/op 3.4041 us/op 1.12
Set add up to 64 items then delete last 2.9646 us/op 2.5261 us/op 1.17
OrderedSet add up to 64 items then delete last 4.3362 us/op 3.6502 us/op 1.19
Set add up to 64 items then delete middle 2.6937 us/op 2.5029 us/op 1.08
OrderedSet add up to 64 items then delete middle 6.0736 us/op 5.3732 us/op 1.13
Set add up to 128 items then delete first 5.8609 us/op 5.0796 us/op 1.15
OrderedSet add up to 128 items then delete first 9.9335 us/op 7.8385 us/op 1.27
Set add up to 128 items then delete last 6.1668 us/op 4.9379 us/op 1.25
OrderedSet add up to 128 items then delete last 8.6801 us/op 7.2522 us/op 1.20
Set add up to 128 items then delete middle 6.2366 us/op 4.8451 us/op 1.29
OrderedSet add up to 128 items then delete middle 16.903 us/op 13.782 us/op 1.23
Set add up to 256 items then delete first 12.904 us/op 10.419 us/op 1.24
OrderedSet add up to 256 items then delete first 21.257 us/op 16.039 us/op 1.33
Set add up to 256 items then delete last 12.229 us/op 9.5360 us/op 1.28
OrderedSet add up to 256 items then delete last 20.080 us/op 14.479 us/op 1.39
Set add up to 256 items then delete middle 11.901 us/op 9.6046 us/op 1.24
OrderedSet add up to 256 items then delete middle 47.581 us/op 40.645 us/op 1.17
transfer serialized Status (84 B) 1.3840 us/op 1.3450 us/op 1.03
copy serialized Status (84 B) 1.2380 us/op 1.0980 us/op 1.13
transfer serialized SignedVoluntaryExit (112 B) 1.4400 us/op 1.4940 us/op 0.96
copy serialized SignedVoluntaryExit (112 B) 1.3350 us/op 1.1340 us/op 1.18
transfer serialized ProposerSlashing (416 B) 2.2970 us/op 1.9170 us/op 1.20
copy serialized ProposerSlashing (416 B) 1.8050 us/op 1.6330 us/op 1.11
transfer serialized Attestation (485 B) 1.9820 us/op 1.7140 us/op 1.16
copy serialized Attestation (485 B) 1.7960 us/op 1.6130 us/op 1.11
transfer serialized AttesterSlashing (33232 B) 2.9190 us/op 2.0600 us/op 1.42
copy serialized AttesterSlashing (33232 B) 8.1620 us/op 5.4770 us/op 1.49
transfer serialized Small SignedBeaconBlock (128000 B) 3.7500 us/op 2.7850 us/op 1.35
copy serialized Small SignedBeaconBlock (128000 B) 23.264 us/op 18.057 us/op 1.29
transfer serialized Avg SignedBeaconBlock (200000 B) 3.8430 us/op 3.2540 us/op 1.18
copy serialized Avg SignedBeaconBlock (200000 B) 40.328 us/op 21.577 us/op 1.87
transfer serialized BlobsSidecar (524380 B) 4.2520 us/op 2.8890 us/op 1.47
copy serialized BlobsSidecar (524380 B) 128.74 us/op 82.921 us/op 1.55
transfer serialized Big SignedBeaconBlock (1000000 B) 6.0400 us/op 2.8670 us/op 2.11
copy serialized Big SignedBeaconBlock (1000000 B) 218.91 us/op 356.47 us/op 0.61
pass gossip attestations to forkchoice per slot 3.2148 ms/op 2.9920 ms/op 1.07
forkChoice updateHead vc 100000 bc 64 eq 0 611.20 us/op 485.99 us/op 1.26
forkChoice updateHead vc 600000 bc 64 eq 0 3.6770 ms/op 3.0593 ms/op 1.20
forkChoice updateHead vc 1000000 bc 64 eq 0 5.7342 ms/op 5.2110 ms/op 1.10
forkChoice updateHead vc 600000 bc 320 eq 0 3.3552 ms/op 2.9976 ms/op 1.12
forkChoice updateHead vc 600000 bc 1200 eq 0 3.5021 ms/op 3.0552 ms/op 1.15
forkChoice updateHead vc 600000 bc 7200 eq 0 4.2344 ms/op 3.6342 ms/op 1.17
forkChoice updateHead vc 600000 bc 64 eq 1000 11.182 ms/op 10.294 ms/op 1.09
forkChoice updateHead vc 600000 bc 64 eq 10000 11.830 ms/op 10.465 ms/op 1.13
forkChoice updateHead vc 600000 bc 64 eq 300000 15.193 ms/op 14.539 ms/op 1.05
computeDeltas 500000 validators 300 proto nodes 3.7164 ms/op 3.4336 ms/op 1.08
computeDeltas 500000 validators 1200 proto nodes 3.7307 ms/op 3.4665 ms/op 1.08
computeDeltas 500000 validators 7200 proto nodes 3.7619 ms/op 3.4664 ms/op 1.09
computeDeltas 750000 validators 300 proto nodes 5.5508 ms/op 5.1807 ms/op 1.07
computeDeltas 750000 validators 1200 proto nodes 5.6313 ms/op 5.0873 ms/op 1.11
computeDeltas 750000 validators 7200 proto nodes 5.4781 ms/op 5.0935 ms/op 1.08
computeDeltas 1400000 validators 300 proto nodes 10.282 ms/op 9.5580 ms/op 1.08
computeDeltas 1400000 validators 1200 proto nodes 11.032 ms/op 9.7230 ms/op 1.13
computeDeltas 1400000 validators 7200 proto nodes 10.548 ms/op 9.8400 ms/op 1.07
computeDeltas 2100000 validators 300 proto nodes 16.196 ms/op 14.989 ms/op 1.08
computeDeltas 2100000 validators 1200 proto nodes 15.549 ms/op 14.864 ms/op 1.05
computeDeltas 2100000 validators 7200 proto nodes 17.202 ms/op 14.546 ms/op 1.18
altair processAttestation - 250000 vs - 7PWei normalcase 2.1589 ms/op 1.7173 ms/op 1.26
altair processAttestation - 250000 vs - 7PWei worstcase 3.1440 ms/op 2.5362 ms/op 1.24
altair processAttestation - setStatus - 1/6 committees join 99.522 us/op 91.024 us/op 1.09
altair processAttestation - setStatus - 1/3 committees join 206.40 us/op 183.54 us/op 1.12
altair processAttestation - setStatus - 1/2 committees join 290.60 us/op 245.81 us/op 1.18
altair processAttestation - setStatus - 2/3 committees join 363.73 us/op 320.30 us/op 1.14
altair processAttestation - setStatus - 4/5 committees join 541.82 us/op 458.45 us/op 1.18
altair processAttestation - setStatus - 100% committees join 614.12 us/op 556.05 us/op 1.10
altair processBlock - 250000 vs - 7PWei normalcase 4.4411 ms/op 4.1500 ms/op 1.07
altair processBlock - 250000 vs - 7PWei normalcase hashState 28.537 ms/op 28.161 ms/op 1.01
altair processBlock - 250000 vs - 7PWei worstcase 45.369 ms/op 41.986 ms/op 1.08
altair processBlock - 250000 vs - 7PWei worstcase hashState 90.603 ms/op 86.696 ms/op 1.05
phase0 processBlock - 250000 vs - 7PWei normalcase 2.2799 ms/op 1.8991 ms/op 1.20
phase0 processBlock - 250000 vs - 7PWei worstcase 29.429 ms/op 27.194 ms/op 1.08
altair processEth1Data - 250000 vs - 7PWei normalcase 317.67 us/op 360.58 us/op 0.88
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15 6.4490 us/op 6.5540 us/op 0.98
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219 27.746 us/op 27.891 us/op 0.99
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42 9.0720 us/op 8.5540 us/op 1.06
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18 6.7240 us/op 6.6750 us/op 1.01
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020 129.40 us/op 109.92 us/op 1.18
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777 789.73 us/op 719.33 us/op 1.10
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384 1.0959 ms/op 919.83 us/op 1.19
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384 1.0279 ms/op 930.43 us/op 1.10
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384 2.6935 ms/op 2.4609 ms/op 1.09
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384 2.0697 ms/op 1.6174 ms/op 1.28
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384 4.2650 ms/op 3.9515 ms/op 1.08
Tree 40 250000 create 261.80 ms/op 215.63 ms/op 1.21
Tree 40 250000 get(125000) 164.44 ns/op 149.16 ns/op 1.10
Tree 40 250000 set(125000) 892.45 ns/op 698.49 ns/op 1.28
Tree 40 250000 toArray() 22.077 ms/op 15.977 ms/op 1.38
Tree 40 250000 iterate all - toArray() + loop 22.695 ms/op 15.772 ms/op 1.44
Tree 40 250000 iterate all - get(i) 64.106 ms/op 50.730 ms/op 1.26
MutableVector 250000 create 13.801 ms/op 7.5876 ms/op 1.82
MutableVector 250000 get(125000) 6.6360 ns/op 6.1610 ns/op 1.08
MutableVector 250000 set(125000) 222.85 ns/op 213.41 ns/op 1.04
MutableVector 250000 toArray() 4.1718 ms/op 4.2385 ms/op 0.98
MutableVector 250000 iterate all - toArray() + loop 5.1076 ms/op 4.2508 ms/op 1.20
MutableVector 250000 iterate all - get(i) 1.6664 ms/op 1.6684 ms/op 1.00
Array 250000 create 3.4600 ms/op 3.7989 ms/op 0.91
Array 250000 clone - spread 1.5147 ms/op 1.5782 ms/op 0.96
Array 250000 get(125000) 0.41700 ns/op 0.42600 ns/op 0.98
Array 250000 set(125000) 0.45400 ns/op 0.45200 ns/op 1.00
Array 250000 iterate all - loop 108.94 us/op 90.879 us/op 1.20
effectiveBalanceIncrements clone Uint8Array 300000 33.316 us/op 34.793 us/op 0.96
effectiveBalanceIncrements clone MutableVector 300000 121.00 ns/op 130.00 ns/op 0.93
effectiveBalanceIncrements rw all Uint8Array 300000 206.13 us/op 200.93 us/op 1.03
effectiveBalanceIncrements rw all MutableVector 300000 73.397 ms/op 69.118 ms/op 1.06
phase0 afterProcessEpoch - 250000 vs - 7PWei 91.285 ms/op 88.973 ms/op 1.03
phase0 beforeProcessEpoch - 250000 vs - 7PWei 42.548 ms/op 43.390 ms/op 0.98
altair processEpoch - mainnet_e81889 383.57 ms/op 410.16 ms/op 0.94
mainnet_e81889 - altair beforeProcessEpoch 66.526 ms/op 67.186 ms/op 0.99
mainnet_e81889 - altair processJustificationAndFinalization 18.692 us/op 11.710 us/op 1.60
mainnet_e81889 - altair processInactivityUpdates 6.1379 ms/op 7.3146 ms/op 0.84
mainnet_e81889 - altair processRewardsAndPenalties 54.878 ms/op 38.295 ms/op 1.43
mainnet_e81889 - altair processRegistryUpdates 1.8640 us/op 1.9430 us/op 0.96
mainnet_e81889 - altair processSlashings 415.00 ns/op 362.00 ns/op 1.15
mainnet_e81889 - altair processEth1DataReset 420.00 ns/op 341.00 ns/op 1.23
mainnet_e81889 - altair processEffectiveBalanceUpdates 1.8844 ms/op 1.1126 ms/op 1.69
mainnet_e81889 - altair processSlashingsReset 5.7360 us/op 3.6770 us/op 1.56
mainnet_e81889 - altair processRandaoMixesReset 6.5670 us/op 3.9610 us/op 1.66
mainnet_e81889 - altair processHistoricalRootsUpdate 817.00 ns/op 289.00 ns/op 2.83
mainnet_e81889 - altair processParticipationFlagUpdates 4.3190 us/op 2.1940 us/op 1.97
mainnet_e81889 - altair processSyncCommitteeUpdates 877.00 ns/op 569.00 ns/op 1.54
mainnet_e81889 - altair afterProcessEpoch 93.924 ms/op 94.800 ms/op 0.99
capella processEpoch - mainnet_e217614 1.3224 s/op 1.3244 s/op 1.00
mainnet_e217614 - capella beforeProcessEpoch 258.04 ms/op 240.16 ms/op 1.07
mainnet_e217614 - capella processJustificationAndFinalization 14.495 us/op 13.958 us/op 1.04
mainnet_e217614 - capella processInactivityUpdates 18.303 ms/op 17.253 ms/op 1.06
mainnet_e217614 - capella processRewardsAndPenalties 263.85 ms/op 230.95 ms/op 1.14
mainnet_e217614 - capella processRegistryUpdates 19.234 us/op 12.714 us/op 1.51
mainnet_e217614 - capella processSlashings 418.00 ns/op 402.00 ns/op 1.04
mainnet_e217614 - capella processEth1DataReset 387.00 ns/op 312.00 ns/op 1.24
mainnet_e217614 - capella processEffectiveBalanceUpdates 3.6225 ms/op 4.7881 ms/op 0.76
mainnet_e217614 - capella processSlashingsReset 2.9040 us/op 3.6190 us/op 0.80
mainnet_e217614 - capella processRandaoMixesReset 4.5390 us/op 3.8950 us/op 1.17
mainnet_e217614 - capella processHistoricalRootsUpdate 1.5550 us/op 1.0520 us/op 1.48
mainnet_e217614 - capella processParticipationFlagUpdates 2.2210 us/op 1.7210 us/op 1.29
mainnet_e217614 - capella afterProcessEpoch 299.65 ms/op 238.46 ms/op 1.26
phase0 processEpoch - mainnet_e58758 348.99 ms/op 358.57 ms/op 0.97
mainnet_e58758 - phase0 beforeProcessEpoch 92.110 ms/op 90.663 ms/op 1.02
mainnet_e58758 - phase0 processJustificationAndFinalization 14.626 us/op 13.615 us/op 1.07
mainnet_e58758 - phase0 processRewardsAndPenalties 28.674 ms/op 22.172 ms/op 1.29
mainnet_e58758 - phase0 processRegistryUpdates 13.667 us/op 7.3750 us/op 1.85
mainnet_e58758 - phase0 processSlashings 689.00 ns/op 355.00 ns/op 1.94
mainnet_e58758 - phase0 processEth1DataReset 468.00 ns/op 294.00 ns/op 1.59
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.1137 ms/op 952.03 us/op 1.17
mainnet_e58758 - phase0 processSlashingsReset 3.8340 us/op 3.2950 us/op 1.16
mainnet_e58758 - phase0 processRandaoMixesReset 8.4020 us/op 3.5160 us/op 2.39
mainnet_e58758 - phase0 processHistoricalRootsUpdate 898.00 ns/op 320.00 ns/op 2.81
mainnet_e58758 - phase0 processParticipationRecordUpdates 3.5040 us/op 2.6340 us/op 1.33
mainnet_e58758 - phase0 afterProcessEpoch 83.169 ms/op 78.061 ms/op 1.07
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.5008 ms/op 1.1497 ms/op 1.31
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 3.4401 ms/op 1.9327 ms/op 1.78
altair processInactivityUpdates - 250000 normalcase 21.756 ms/op 15.542 ms/op 1.40
altair processInactivityUpdates - 250000 worstcase 19.168 ms/op 15.077 ms/op 1.27
phase0 processRegistryUpdates - 250000 normalcase 8.1030 us/op 5.9300 us/op 1.37
phase0 processRegistryUpdates - 250000 badcase_full_deposits 455.18 us/op 268.81 us/op 1.69
phase0 processRegistryUpdates - 250000 worstcase 0.5 136.41 ms/op 105.03 ms/op 1.30
altair processRewardsAndPenalties - 250000 normalcase 46.786 ms/op 32.632 ms/op 1.43
altair processRewardsAndPenalties - 250000 worstcase 44.760 ms/op 36.039 ms/op 1.24
phase0 getAttestationDeltas - 250000 normalcase 9.8761 ms/op 6.9205 ms/op 1.43
phase0 getAttestationDeltas - 250000 worstcase 9.7662 ms/op 7.9507 ms/op 1.23
phase0 processSlashings - 250000 worstcase 113.28 us/op 75.082 us/op 1.51
altair processSyncCommitteeUpdates - 250000 124.40 ms/op 124.91 ms/op 1.00
BeaconState.hashTreeRoot - No change 310.00 ns/op 261.00 ns/op 1.19
BeaconState.hashTreeRoot - 1 full validator 135.34 us/op 97.207 us/op 1.39
BeaconState.hashTreeRoot - 32 full validator 1.5683 ms/op 1.3545 ms/op 1.16
BeaconState.hashTreeRoot - 512 full validator 12.818 ms/op 12.527 ms/op 1.02
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 127.80 us/op 136.29 us/op 0.94
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.8159 ms/op 1.7706 ms/op 1.03
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 28.026 ms/op 20.373 ms/op 1.38
BeaconState.hashTreeRoot - 1 balances 128.22 us/op 102.84 us/op 1.25
BeaconState.hashTreeRoot - 32 balances 1.2565 ms/op 1.0377 ms/op 1.21
BeaconState.hashTreeRoot - 512 balances 7.8890 ms/op 10.968 ms/op 0.72
BeaconState.hashTreeRoot - 250000 balances 173.45 ms/op 164.12 ms/op 1.06
aggregationBits - 2048 els - zipIndexesInBitList 28.043 us/op 22.649 us/op 1.24
byteArrayEquals 32 54.865 ns/op 51.648 ns/op 1.06
Buffer.compare 32 47.813 ns/op 46.074 ns/op 1.04
byteArrayEquals 1024 1.6179 us/op 1.5252 us/op 1.06
Buffer.compare 1024 60.247 ns/op 56.708 ns/op 1.06
byteArrayEquals 16384 25.958 us/op 24.259 us/op 1.07
Buffer.compare 16384 241.36 ns/op 234.24 ns/op 1.03
byteArrayEquals 123687377 201.52 ms/op 181.81 ms/op 1.11
Buffer.compare 123687377 11.028 ms/op 6.1576 ms/op 1.79
byteArrayEquals 32 - diff last byte 57.430 ns/op 50.423 ns/op 1.14
Buffer.compare 32 - diff last byte 52.086 ns/op 45.004 ns/op 1.16
byteArrayEquals 1024 - diff last byte 1.6476 us/op 1.5121 us/op 1.09
Buffer.compare 1024 - diff last byte 61.072 ns/op 58.290 ns/op 1.05
byteArrayEquals 16384 - diff last byte 26.642 us/op 24.072 us/op 1.11
Buffer.compare 16384 - diff last byte 253.27 ns/op 292.37 ns/op 0.87
byteArrayEquals 123687377 - diff last byte 202.77 ms/op 185.26 ms/op 1.09
Buffer.compare 123687377 - diff last byte 8.2278 ms/op 6.1964 ms/op 1.33
byteArrayEquals 32 - random bytes 5.3920 ns/op 5.6750 ns/op 0.95
Buffer.compare 32 - random bytes 51.818 ns/op 49.465 ns/op 1.05
byteArrayEquals 1024 - random bytes 5.3830 ns/op 5.0430 ns/op 1.07
Buffer.compare 1024 - random bytes 49.623 ns/op 48.402 ns/op 1.03
byteArrayEquals 16384 - random bytes 5.3580 ns/op 5.0330 ns/op 1.06
Buffer.compare 16384 - random bytes 49.447 ns/op 48.526 ns/op 1.02
byteArrayEquals 123687377 - random bytes 6.6100 ns/op 6.3800 ns/op 1.04
Buffer.compare 123687377 - random bytes 51.730 ns/op 49.950 ns/op 1.04
regular array get 100000 times 40.078 us/op 36.644 us/op 1.09
wrappedArray get 100000 times 34.156 us/op 32.797 us/op 1.04
arrayWithProxy get 100000 times 13.282 ms/op 13.024 ms/op 1.02
ssz.Root.equals 47.159 ns/op 45.348 ns/op 1.04
byteArrayEquals 46.374 ns/op 44.715 ns/op 1.04
Buffer.compare 10.872 ns/op 10.807 ns/op 1.01
shuffle list - 16384 els 6.7439 ms/op 6.2941 ms/op 1.07
shuffle list - 250000 els 94.708 ms/op 93.269 ms/op 1.02
processSlot - 1 slots 15.077 us/op 11.701 us/op 1.29
processSlot - 32 slots 2.7405 ms/op 2.0810 ms/op 1.32
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 37.363 ms/op 37.857 ms/op 0.99
getCommitteeAssignments - req 1 vs - 250000 vc 2.2092 ms/op 2.1239 ms/op 1.04
getCommitteeAssignments - req 100 vs - 250000 vc 4.2180 ms/op 4.0570 ms/op 1.04
getCommitteeAssignments - req 1000 vs - 250000 vc 4.5283 ms/op 4.4321 ms/op 1.02
findModifiedValidators - 10000 modified validators 264.05 ms/op 247.07 ms/op 1.07
findModifiedValidators - 1000 modified validators 186.58 ms/op 178.95 ms/op 1.04
findModifiedValidators - 100 modified validators 204.19 ms/op 172.88 ms/op 1.18
findModifiedValidators - 10 modified validators 210.04 ms/op 188.09 ms/op 1.12
findModifiedValidators - 1 modified validators 176.99 ms/op 170.27 ms/op 1.04
findModifiedValidators - no difference 183.23 ms/op 170.26 ms/op 1.08
compare ViewDUs 3.3602 s/op 3.1558 s/op 1.06
compare each validator Uint8Array 1.4793 s/op 1.5755 s/op 0.94
compare ViewDU to Uint8Array 1.4754 s/op 1.1476 s/op 1.29
migrate state 1000000 validators, 24 modified, 0 new 703.25 ms/op 564.41 ms/op 1.25
migrate state 1000000 validators, 1700 modified, 1000 new 817.33 ms/op 846.89 ms/op 0.97
migrate state 1000000 validators, 3400 modified, 2000 new 954.23 ms/op 987.94 ms/op 0.97
migrate state 1500000 validators, 24 modified, 0 new 585.58 ms/op 606.76 ms/op 0.97
migrate state 1500000 validators, 1700 modified, 1000 new 847.81 ms/op 807.60 ms/op 1.05
migrate state 1500000 validators, 3400 modified, 2000 new 1.1703 s/op 997.52 ms/op 1.17
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 5.2500 ns/op 4.6700 ns/op 1.12
state getBlockRootAtSlot - 250000 vs - 7PWei 686.72 ns/op 700.92 ns/op 0.98
computeProposers - vc 250000 7.7363 ms/op 7.6927 ms/op 1.01
computeEpochShuffling - vc 250000 100.38 ms/op 93.413 ms/op 1.07
getNextSyncCommittee - vc 250000 134.57 ms/op 124.08 ms/op 1.08
computeSigningRoot for AttestationData 25.617 us/op 21.532 us/op 1.19
hash AttestationData serialized data then Buffer.toString(base64) 1.7549 us/op 1.5348 us/op 1.14
toHexString serialized data 1.2573 us/op 883.47 ns/op 1.42
Buffer.toString(base64) 226.82 ns/op 175.86 ns/op 1.29

by benchmarkbot/action

@dapplion
Copy link
Contributor

Some todos:

  • proof on a benchmark the need for packages/beacon-node/src/util/fullOrBlindedBlock.ts. Compare the difference between the two points below, and unless there's a massive difference, just do the simpler strategy merging structs. After doing the benchmarks, persist the results in code, add a new comment to this PR with the results, and delete the losing codepath.
    • deserialize, merge structs, serialize
    • serialize exec payload, merge as bytes
  • convert the fixtures to .ssz format to reduce the diff

@matthewkeil
Copy link
Member Author

matthewkeil commented Oct 11, 2023

  • convert the fixtures to .ssz format to reduce the diff

@dapplion I am working on that conversion now. When I did it this evening the unit tests for capella broke. I had logic to convert the mainnet mocks to work with the minimal testing preset in the mock loading file. I manually converted them tonight to minimal config before saving them serialized but something needs debugging. I was modifing the raw JSON before using the @lodestar/types because of how the LODESTAR_PRESET flows into the ssz types but when I hand converted something was not converted correctly. Ill find it in the AM and push the changes.

  • proof on a benchmark the need for packages/beacon-node/src/util/fullOrBlindedBlock.ts. Compare the difference between the two points below, and unless there's a massive difference, just do the simpler strategy merging structs. After doing the benchmarks, persist the results in code, add a new comment to this PR with the results, and delete the losing codepath.

    • deserialize, merge structs, serialize
    • serialize exec payload, merge as bytes

I remembered chatting with you about this a couple weeks ago and got it ready for you :) Apologies, I should have brought this up when we spoke before standup. I forgot with all the other stuff we chatted about.

I posted those results from the perf test on the issue:
#5671 (comment)

I copied the results in that comment below so they are part of this PR too for visibility.

The test file is in this commit of this PR:
4112724

The results seem like the serialize is the way to go, is a couple of orders of magnitude faster, but would love to get your opinion before I delete the losing codepath. The perf test is in the commit linked above so you can check the methodology. I will leave the perf test as part of this PR if serialize is how you want to go.

I was thinking about removing the generator function and just returning a promise after our discussion before standup. I will rerun the perf tests like that to compare and post them in a comment below tomorrow once I sort out the mock serialization bug.

  fullOrBlindedBlock
    BlindedOrFull to full
      phase0
        ✔ phase0 to full - deserialize first                                  9646.737 ops/s    103.6620 us/op        -       4856 runs  0.617 s
        ✔ phase0 to full - convert serialized                                  2865330 ops/s    349.0000 ns/op        -    1740003 runs  0.909 s
      altair
        ✔ altair to full - deserialize first                                  5352.431 ops/s    186.8310 us/op        -       2699 runs  0.697 s
        ✔ altair to full - convert serialized                                  2967359 ops/s    337.0000 ns/op        -    1598138 runs  0.808 s
      bellatrix
        ✔ bellatrix to full - deserialize first                               3991.474 ops/s    250.5340 us/op        -       1208 runs  0.553 s
        ✔ bellatrix to full - convert serialized                               2463054 ops/s    406.0000 ns/op        -     879455 runs  0.505 s
      capella
        ✔ capella to full - deserialize first                                 3660.175 ops/s    273.2110 us/op        -       1846 runs  0.783 s
        ✔ capella to full - convert serialized                                 2364066 ops/s    423.0000 ns/op        -    2012155 runs   1.21 s
      deneb
        ✔ deneb to full - deserialize first                                   3621.915 ops/s    276.0970 us/op        -       1827 runs  0.806 s
        ✔ deneb to full - convert serialized                                   2398082 ops/s    417.0000 ns/op        -     506726 runs  0.303 s
    BlindedOrFull to blinded
      phase0
        ✔ phase0 to blinded - deserialize first                               12937.95 ops/s    77.29200 us/op        -       4230 runs  0.404 s
        ✔ phase0 to blinded - convert serialized                           1.000000e+7 ops/s    100.0000 ns/op        -    3120420 runs  0.606 s
      altair
        ✔ altair to blinded - deserialize first                               7185.198 ops/s    139.1750 us/op        -       2170 runs  0.439 s
        ✔ altair to blinded - convert serialized                               9900990 ops/s    101.0000 ns/op        -    2588758 runs  0.505 s
      bellatrix
        ✔ bellatrix to blinded - deserialize first                            100.1679 ops/s    9.983241 ms/op        -         76 runs   1.26 s
        ✔ bellatrix to blinded - convert serialized                           92.22430 ops/s    10.84313 ms/op        -        117 runs   1.77 s
      capella
        ✔ capella to blinded - deserialize first                              45.29530 ops/s    22.07735 ms/op        -         48 runs   1.58 s
        ✔ capella to blinded - convert serialized                             43.09465 ops/s    23.20474 ms/op        -         50 runs   1.66 s
      deneb
        ✔ deneb to blinded - deserialize first                                45.42834 ops/s    22.01269 ms/op        -         51 runs   1.63 s
        ✔ deneb to blinded - convert serialized                               46.20545 ops/s    21.64247 ms/op        -         46 runs   1.50 s

@matthewkeil
Copy link
Member Author

matthewkeil commented Oct 11, 2023

Some todos:

  • proof on a benchmark the need for packages/beacon-node/src/util/fullOrBlindedBlock.ts. Compare the difference between the two points below, and unless there's a massive difference, just do the simpler strategy merging structs. After doing the benchmarks, persist the results in code, add a new comment to this PR with the results, and delete the losing codepath.

    • deserialize, merge structs, serialize
    • serialize exec payload, merge as bytes
  • convert the fixtures to .ssz format to reduce the diff

btw @dapplion I added these, and one for converting from generator and retesting perf, to the checklist above

@dapplion
Copy link
Contributor

@matthewkeil thanks! the differences in performance do not justify doing the complex byte manipulation IMO. Just merge structs.

Copy link
Contributor

@g11tech g11tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just blocking right now for a deeper review as it might affect some of the critical paths i want to double check + with the produceblockv3 PR types and helpers...

will also dig into the mergemock requirements

@matthewkeil
Copy link
Member Author

⚠️ Performance Alert ⚠️

Possible performance regression was detected for some benchmarks. Benchmark result of this commit is worse than the previous benchmark result exceeding threshold.

Benchmark suite Current: 49ab90f Previous: 3a6702e Ratio
forkChoice updateHead vc 600000 bc 64 eq 300000 72.026 ms/op 18.857 ms/op 3.82
Full benchmark results

by benchmarkbot/action

@dapplion there is a benchmark regression after removing the serialized blinding/unblinding. There is not a big difference in time for the blinding process and the increase in the updateHead test seems higher than it should be.

@matthewkeil matthewkeil force-pushed the mkeil/dedup-beacon-block-2 branch from 697870f to 158cb40 Compare October 22, 2023 22:13
@g11tech g11tech force-pushed the mkeil/dedup-beacon-block-2 branch from 8d804ea to 2fb8f48 Compare October 30, 2023 07:49
@g11tech g11tech changed the base branch from unstable to deneb-builder October 30, 2023 07:50
return firstByte - readExtraDataOffsetAt > 92;
}

// same as isBlindedSignedBeaconBlock but without type narrowing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the issue with type narrowing?

canonical,
header: {
message: blockToHeader(config, block.message),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its cleaner to extend blockToHeader to accept full or blinded,

also then the root above can be calulated from the header returned by hashtree root of the blockheader ... it should be more efficient since body won't be merklized twice

Base automatically changed from deneb-builder to unstable October 30, 2023 13:42
@g11tech g11tech force-pushed the mkeil/dedup-beacon-block-2 branch from 2fb8f48 to adfa2f5 Compare October 30, 2023 13:45
@codecov
Copy link

codecov bot commented Oct 30, 2023

Codecov Report

Attention: Patch coverage is 7.69231% with 24 lines in your changes missing coverage. Please review.

Project coverage is 62.73%. Comparing base (e6c559f) to head (c36a0c9).

Current head c36a0c9 differs from pull request most recent head d5a07e6

Please upload reports for the commit d5a07e6 to get more accurate results.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #6029      +/-   ##
============================================
+ Coverage     62.52%   62.73%   +0.21%     
============================================
  Files           575      578       +3     
  Lines         60985    61367     +382     
  Branches       2125     2114      -11     
============================================
+ Hits          38130    38500     +370     
- Misses        22816    22829      +13     
+ Partials         39       38       -1     

@matthewkeil matthewkeil force-pushed the mkeil/dedup-beacon-block-2 branch from adfa2f5 to c36a0c9 Compare June 14, 2024 16:02
@@ -29,6 +29,7 @@ import {
isBlindedBeaconBlock,
BeaconBlock,
SignedBeaconBlock,
FullOrBlindedSignedBeaconBlock,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we reintroduce this type? Based on discussion in type refactor we don't want this type

@matthewkeil
Copy link
Member Author

Closed in favor of #7034

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

De-duplicate payload from persisted beacon blocks
5 participants