Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure coverage #25

Closed
ligi opened this issue Jun 11, 2018 · 26 comments
Closed

Measure coverage #25

ligi opened this issue Jun 11, 2018 · 26 comments

Comments

@ligi
Copy link
Member

ligi commented Jun 11, 2018

based on the list by @holiman in #23

ligi added a commit that referenced this issue Jun 11, 2018
@ligi ligi closed this as completed in bd810d9 Jun 11, 2018
@ligi
Copy link
Member Author

ligi commented Jun 11, 2018

Unfortunately coverage is really bad:
found 1420 - missed 291518

cc @holiman @pipermerriam

@holiman
Copy link
Contributor

holiman commented Jun 11, 2018

So, how did you count?

  1. Number of hits from the unique signatures
  2. The odds that a given transaction matches a known signature. I.e, taking into account the popularity of the signature.

I'm more interested in the second figure. And from that, we should also be able to weed out a toplist of most-wanted signatures, that are the 'highest prio' to discover in order to increase the coverage.

@holiman
Copy link
Contributor

holiman commented Jun 11, 2018

Also, not every invocation is an ABI-call. There could be invocations of precompiles in there, which does e.g. sha3 and identity. Those will likely create a lot of low-ranking signatures which we can ignore.

@holiman
Copy link
Contributor

holiman commented Jun 11, 2018

And with 2), I mean coverage like this: pipermerriam/ethereum-function-signature-registry#30 (comment)

@ligi
Copy link
Member Author

ligi commented Jun 11, 2018

Ah - makes sense - I was just counting how many of the signatures in your list are in the 4-byte directory - think this is 1 - will then also do 2 - sounds useful to have

@ligi
Copy link
Member Author

ligi commented Jun 11, 2018

here the top 20 of signatures not found:

0xb9b8af0b 26091
0xb269681d 26091
0x97dc97cb 26077
0x8d12a197 18287
0x14c9035e 8410
0x5e5144eb 8090
0x488725a0 6881
0x1d6456c4 6525
0xcc135813 5948
0x1801fbe5 5678
0x14cba002 5654
0x40c10f19 4160
0xc281d19e 4127
0x2cf86006 3447
0x688abbf7 3444
0xa90cf6f4 3339
0xddf9d613 3338
0x01000000 3133
0x2d923501 2988
0x392a4fa2 2963

interesting also that 0xb9b8af0b and 0xb269681d have the same count - wonder if they always get called as a pair or if there is something else behind these both

@ligi
Copy link
Member Author

ligi commented Jun 11, 2018

@holiman When weighting in the call amounts we get better results: there is a 74% chance the signature is found in the database (found 0.7391538 - missed 0.2608462)

@ligi
Copy link
Member Author

ligi commented Jun 11, 2018

If we find the top 20 we increase to 79.82%

@holiman
Copy link
Contributor

holiman commented Jun 11, 2018

Let's aim for 90% coverage before end of July. Which ones do we need to find for that?

@holiman
Copy link
Contributor

holiman commented Jun 11, 2018

Maybe I should redo the trace and ignore calls to precompiles though...

@holiman
Copy link
Contributor

holiman commented Jun 11, 2018

Could also dump the to-address, so we can more easily lookup verified code...

@ligi
Copy link
Member Author

ligi commented Jun 11, 2018

Let's aim for 90% coverage before end of July. Which ones do we need to find for that?

unfortunately this is a huge list

@holiman
Copy link
Contributor

holiman commented Jun 11, 2018

Hm, unfortunately the tracer already filtered out precompiles :(. However, it appears that a lot of the top hits were already discovered by me long ago -- they however suffer from the overactive regexp which does not allow int within the function name. I think I found most of the top 20 now with some google searches. I submitted 11 sol-files to 4byte, but it generated a db error. So that one needs to be fixed.

@ligi
Copy link
Member Author

ligi commented Jun 11, 2018

Can you send me the 11 sol files?

@holiman
Copy link
Contributor

holiman commented Jun 11, 2018

Here you go: solfiles.tar.gz

@chfast
Copy link

chfast commented Jun 11, 2018

You can google them :)

{"constant":true,"inputs":[],"name":"finalized","outputs":[{"name":"","type":"bool"}],"payable":false,"type":"function","signature":"0xb3f05b97"}

@holiman
Copy link
Contributor

holiman commented Jun 12, 2018

unfortunately this is a huge list

It can't be that huge. Can you get me the full list of unknowns, preferrably including call data length?

@ligi
Copy link
Member Author

ligi commented Jun 12, 2018

It can't be that huge. Can you get me the full list of unknowns, preferrably including call data length?

it was like 11k entries if I remember correctly - the problem is that the call amounts insignificant very soon in the list and so you need a lot to cover ground to come to 90%

@holiman
Copy link
Contributor

holiman commented Jun 12, 2018

Ah. There's an error here: https://github.com/ethereum/go-ethereum/blob/master/eth/tracers/internal/tracers/4byte_tracer.js#L63 . Should be

if (isPrecompiled(toAddress(log.stack.peek(1).toString(16)))) {

So all those one-offs are false positives 👍

@holiman
Copy link
Contributor

holiman commented Jun 12, 2018

New tracing is in progress

@holiman
Copy link
Contributor

holiman commented Jun 12, 2018

Also, if we remove all signatures where the supplied data (minus four bytes) are not a multiple of 32, then roughly half the entries go away

@holiman
Copy link
Contributor

holiman commented Jun 12, 2018

Preliminary result (it's still tracing). These signatures should be sufficient to get 90% coverage (I have not filtered out which ones already exist in the db)

222368 0xa9059cbb-64 46.94% (46.94% aggregate)
22495 0xef343588-576 4.75% (51.69% aggregate)
10977 0x23b872dd-96 2.32% (54.01% aggregate)
7676 0x095ea7b3-64 1.62% (55.63% aggregate)
7058 0xf2c298be-128 1.49% (57.12% aggregate)
6446 0x70a08231-32 1.36% (58.48% aggregate)
6367 0x1a695230-32 1.34% (59.82% aggregate)
6295 0x2295115b-256 1.33% (61.15% aggregate)
5953 0x88c2a0bf-32 1.26% (62.41% aggregate)
5860 0x8da5cb5b-0 1.24% (63.64% aggregate)
5435 0xb9b8af0b-0 1.15% (64.79% aggregate)
5435 0xb269681d-0 1.15% (65.94% aggregate)
5435 0x6ea056a9-64 1.15% (67.09% aggregate)
5429 0x97dc97cb-0 1.15% (68.23% aggregate)
5429 0x3c18d318-32 1.15% (69.38% aggregate)
4509 0x338b5dea-64 0.95% (70.33% aggregate)
4355 0x28090abb-128 0.92% (71.25% aggregate)
3152 0xf7d8c883-64 0.67% (71.91% aggregate)
2885 0x16c72721-0 0.61% (72.52% aggregate)
2798 0x0a19b14a-352 0.59% (73.11% aggregate)
2573 0x0f2c9329-64 0.54% (73.66% aggregate)
2407 0xf088d547-32 0.51% (74.17% aggregate)
2172 0xdd62ed3e-64 0.46% (74.62% aggregate)
2148 0x688abbf7-32 0.45% (75.08% aggregate)
2105 0x3ccfd60b-0 0.44% (75.52% aggregate)
1951 0xc31e0547-0 0.41% (75.93% aggregate)
1900 0x40c10f19-64 0.40% (76.33% aggregate)
1834 0x0d9f5aed-96 0.39% (76.72% aggregate)
1806 0x18160ddd-0 0.38% (77.10% aggregate)
1686 0x38cc4831-0 0.36% (77.46% aggregate)
1550 0x1134269a-544 0.33% (77.79% aggregate)
1447 0x313ce567-0 0.31% (78.09% aggregate)
1406 0x867904b4-64 0.30% (78.39% aggregate)
1402 0x00000001-32 0.30% (78.68% aggregate)
1400 0x4b75f54f-0 0.30% (78.98% aggregate)
1394 0x5e5144eb-128 0.29% (79.27% aggregate)
1376 0x29a00e7c-128 0.29% (79.56% aggregate)
1373 0x49f9b0f7-128 0.29% (79.85% aggregate)
1353 0xa24835d1-64 0.29% (80.14% aggregate)
1341 0x14c9035e-512 0.28% (80.42% aggregate)
1316 0x2e1a7d4d-32 0.28% (80.70% aggregate)
1292 0xd0679d34-64 0.27% (80.97% aggregate)
1258 0x9e281a98-64 0.27% (81.24% aggregate)
1227 0xcc135813-32 0.26% (81.50% aggregate)
1122 0x278b8c0e-288 0.24% (81.73% aggregate)
1096 0x454a2ab3-32 0.23% (81.97% aggregate)
1091 0x39125215-384 0.23% (82.20% aggregate)
1078 0x764358e6-96 0.23% (82.42% aggregate)
942 0x488725a0-32 0.20% (82.62% aggregate)
932 0x4a393149-96 0.20% (82.82% aggregate)
932 0x379607f5-32 0.20% (83.02% aggregate)
925 0x6352211e-32 0.20% (83.21% aggregate)
883 0xbeabacc8-96 0.19% (83.40% aggregate)
882 0x3452f51d-64 0.19% (83.58% aggregate)
880 0xb0c80972-64 0.19% (83.77% aggregate)
872 0xd0e30db0-0 0.18% (83.95% aggregate)
870 0xc281d19e-0 0.18% (84.14% aggregate)
853 0x47872b42-96 0.18% (84.32% aggregate)
850 0x3fa4f245-0 0.18% (84.50% aggregate)
848 0x05b34410-0 0.18% (84.68% aggregate)
793 0x2ef3accc-128 0.17% (84.84% aggregate)
792 0x64887334-416 0.17% (85.01% aggregate)
763 0x515c1457-192 0.16% (85.17% aggregate)
761 0x23de6651-96 0.16% (85.33% aggregate)
691 0x161ff662-224 0.15% (85.48% aggregate)
679 0x14cba002-192 0.14% (85.62% aggregate)
673 0xcd905dff-0 0.14% (85.76% aggregate)
673 0x163f7522-32 0.14% (85.91% aggregate)
673 0x103aeda7-32 0.14% (86.05% aggregate)
639 0x929268b8-160 0.13% (86.18% aggregate)
639 0x6ace6dc8-96 0.13% (86.32% aggregate)
627 0x1801fbe5-64 0.13% (86.45% aggregate)
614 0x02571be3-32 0.13% (86.58% aggregate)
609 0x38bbfa50-256 0.13% (86.71% aggregate)
607 0xddf9d613-64 0.13% (86.84% aggregate)
607 0xa90cf6f4-128 0.13% (86.96% aggregate)
594 0x27ebe40a-160 0.13% (87.09% aggregate)
578 0xed1b71ea-96 0.12% (87.21% aggregate)
558 0xda14c723-64 0.12% (87.33% aggregate)
558 0x4989b0b6-96 0.12% (87.45% aggregate)
558 0x0cb1a3a9-160 0.12% (87.57% aggregate)
530 0x1962df71-160 0.11% (87.68% aggregate)
528 0x15dacbea-128 0.11% (87.79% aggregate)
508 0x3ec862a8-32 0.11% (87.90% aggregate)
506 0x205c2878-64 0.11% (88.00% aggregate)
498 0x2d923501-128 0.11% (88.11% aggregate)
491 0x392a4fa2-160 0.10% (88.21% aggregate)
481 0xfebefd61-128 0.10% (88.31% aggregate)
476 0x963f2334-96 0.10% (88.41% aggregate)
476 0x7da40b65-32 0.10% (88.51% aggregate)
476 0x2a54d313-192 0.10% (88.61% aggregate)
468 0xe31c71c4-64 0.10% (88.71% aggregate)
468 0x4e30a66c-64 0.10% (88.81% aggregate)
461 0xdc6dd152-32 0.10% (88.91% aggregate)
461 0xc51be90f-672 0.10% (89.01% aggregate)
414 0xcbf9b84b-64 0.09% (89.09% aggregate)
385 0xe0cb3aa0-64 0.08% (89.18% aggregate)
381 0x01c6adc3-64 0.08% (89.26% aggregate)
368 0xc8ee0c6b-192 0.08% (89.33% aggregate)
368 0xa1add510-96 0.08% (89.41% aggregate)
361 0xc8fea2fb-96 0.08% (89.49% aggregate)
348 0xb193b516-992 0.07% (89.56% aggregate)
328 0x394c21e7-384 0.07% (89.63% aggregate)
317 0x3d7d3f5a-128 0.07% (89.70% aggregate)
316 0x5643a711-64 0.07% (89.76% aggregate)
311 0x9468d6eb-224 0.07% (89.83% aggregate)
308 0xc7704d6a-224 0.07% (89.89% aggregate)
307 0xf4f3bdc1-64 0.06% (89.96% aggregate)
300 0x19774d43-64 0.06% (90.02% aggregate)

@holiman
Copy link
Contributor

holiman commented Jun 12, 2018

Wow, 46% of all chain action is the ERC20 transfer method. I think that the actual coverage is quite high (probably already above 90%), once the false positives have been removed.

@holiman
Copy link
Contributor

holiman commented Jun 13, 2018

Ok, here's the new list, where I have not yet weeded out what is already in the db.
4byteresp.jsonl_final.txt.gz

@holiman
Copy link
Contributor

holiman commented Jun 13, 2018

And the final results are in: https://gist.github.com/holiman/563da876c4ce15629f57ffdc4046383b
The good news is that we're at almost 90% coverage!

@holiman
Copy link
Contributor

holiman commented Jun 13, 2018

I located a few more, we're now at 90.13%. Good enough for now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants