Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seg: speed up rebuildSegments (Alternative) #12245

Closed
wants to merge 3 commits into from

Conversation

stevemilk
Copy link
Contributor

This is alternative to #12114

Add a subdir "/snapshots/commitment" to store commitment files for .seg/.idx files that are fully downloaded.
In this way we know if downloading is finished and if it's ok to open downloaded files.

  • do not reopen .seg/.idx files if already opened during rebuildSegments
  • accelerate scanning files by avoiding doubly nested loop

@stevemilk stevemilk changed the title turbo: speed up rebuildSegments turbo: speed up rebuildSegments (Alternative) Oct 8, 2024
@AskAlexSharov AskAlexSharov self-requested a review October 8, 2024 12:01
@AskAlexSharov
Copy link
Collaborator

i'm on small vacation - will review next week. thank you.

@mriccobene
Copy link
Member

This PR breaks 353 tests on RPC interface, please check! See details on "QA - RPC Integration Tests"

@stevemilk
Copy link
Contributor Author

stevemilk commented Oct 9, 2024

This PR breaks 353 tests on RPC interface, please check! See details on "QA - RPC Integration Tests"

checking.

I try to start a rpc daemon on my machine and call it to reproduce. Is this the right way?
I don't have much experience on erigon's integration tests right now, will appreciate it if any guidance/documents regarding it @mriccobene thanks!

@mriccobene
Copy link
Member

mriccobene commented Oct 11, 2024

I try to start a rpc daemon on my machine and call it to reproduce. Is this the right way? I don't have much experience on erigon's integration tests right now, will appreciate it if any guidance/documents regarding it

I am very sorry Steve, I missed this message, I'll fix it now.

If you look at the failed test log here and open the "Run RPC integration tests" section you will see the failed calls. For example see:

0168. erigon_getBlockByTimestamp/test_02.json Failed

The rpc tests we run are here: https://github.com/erigontech/rpc-tests/tree/main/integration/mainnet

If we go to the erigon_getBlockByTimestamp folder and look at the test_02.json file we can see the request that is made and the expected result. It's this:
https://github.com/erigontech/rpc-tests/blob/main/integration/mainnet/erigon_getBlockByTimestamp/test_02.json

To reproduce it you just need to use CURL to Erigon with a request like this

$ curl --location 'http://localhost:8545' \
   --header 'Content-Type: application/json' \
   --data '{
         "jsonrpc":"2.0",
         "method":"erigon_getBlockByTimestamp",
         "params":["3272722465", false],
         "id":1
   },'

There is no need to use external rpcdaemon, you can use Erigon provided that you enabled proper JSON RPC interfaces on command line, with for example --http.api admin,debug,eth,parity,erigon,trace,web3,txpool,ots,net

According to the test you should expect a response like this:

"response": {
   "jsonrpc":"2.0",
   "id":1,
   "result": null
}

@stevemilk
Copy link
Contributor Author

stevemilk commented Oct 11, 2024

Thank you so much!
Luckily I followed the same way that your suggested to test these days, but the problem is my machine doesn't have enough storage to run a mainnet node to test. I tried CURL before sync completed, failed with no surprise.
Is there another way to test the code locally? for example integration tests prepared for testnet.
If it's the only way, I guess I need buy an external drive.

@mriccobene
Copy link
Member

[...] for example integration tests prepared for testnet. If it's the only way, I guess I need buy an external drive.

Unfortunately these tests require a db close to the chain tip and there is no testnet version

@stevemilk
Copy link
Contributor Author

Hi @AskAlexSharov
Could you please help with this question: in what scenarios will we open an incomplete file?
One scenario I know is:

  • Abort before download completes, then restart. Before downloading again, will ReopenFolder

Then I cannot find reopen being called any where before download complete. After download completed, files are all complete so can skip integrity check and open only once.

Any corner cases I miss? I'm reviewing the lifecycle of snapshots and trying to improve rebuild and avoid race, hope not to miss any cases before drafting a design.

@AskAlexSharov
Copy link
Collaborator

Here is 3 features:

  • Download happening only once. All other files are generated by Erigon itself.
  • RPCDaemon can be started without erigon (erigon will never start) - in this case RPCDaemon need “optimisticaly open” what it can open - and serve RPC requests (usually all will just work).
  • Erigon also can be started with —no-downloader flag (or with external downloader - which can be down) - in this case no way to call any downloader API methods. And Erigon also does Optimistically open files by this reason.

Corner case:

  • we don’t have any indicator of “download is complete or not”. In the past i used prohibit_new_downloads.lock file for this - but now it created before download completion (and it’s fine). so, maybe we can create another file.

In my head:

  • if download is not done - then it’s ok to not open any files. Then we never need Re-Open all - only need open new (generated) files.

@AskAlexSharov
Copy link
Collaborator

One nice to have thing is:

  • our big users are Node Providers (RPC API providers - b2b). and we trying to minimize RPC downtime for them on restart.

@AskAlexSharov AskAlexSharov changed the title turbo: speed up rebuildSegments (Alternative) seg: speed up rebuildSegments (Alternative) Oct 16, 2024
@AskAlexSharov
Copy link
Collaborator

Seems we have marker of download completeness: #12332

@stevemilk
Copy link
Contributor Author

Seems we have marker of download completeness: #12332

Fantastic! You're right! i hadn't figured out this part of logic before.
It gonna be much simpler to build on top of your commit.

@stevemilk
Copy link
Contributor Author

closing in favor of #12332

@stevemilk stevemilk closed this Oct 19, 2024
@stevemilk
Copy link
Contributor Author

closing in favor of #12332

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants