Problem: `usecase/syncstrategy/window.go` one failure API request fails all API requests #639

ysong42 · 2021-12-07T10:57:19Z

@allthatjazzleo Reported that sometimes, our explorer is lagging behind.

Sometimes, some of our internal nodes lagged behind (block data source for indexing server). Then the indexing server failed to pull the latest blocks, and it caused the information on explorer was not the latest.

At the same time, some users were directed to explorer to check their transactions status (maybe through Defi wallet), and they can't find their transactions there, which was expected as the indexing server lagged behind.

Problem

When we use Window SyncStrategy, (usecase/syncstrategy/window.go), we will pull block data in a batch manner.

However, if any of the blockchain nodes we are requesting are lagged behind, the pulling will be stuck.

Imagine we have three blockchain nodes behind the load-balancer.

Node-1 is good, API request always return 200.
Node-2 is lagged behind, API request to it will return 5XX error.

Say the Window size is 3. Then the indexing in one round will try to fetch 3 blocks.

It is possible some requests are directed to the lagging blockchain node, and it return 5XX error.

Request-to-block-0: sent to Node-1, 200
Request-to-block-1: sent to Node-2, 5XX
Request-to-block-2: sent to Node-1, 200

Although block-0 and block-2 data is fetched, the Window will ignore them and return an error.

In our current setting, we have 3 blockchain nodes, and the window size is 50. Therefore, when one blockchain node is down or lag behind, it is very likely the pulling of block data will be blocked. As it is very likely one request among the 50 requests will sent to the lagged node, which blocks the projection.

Sometimes it could take a few minutes for the nodes to come back. During that time, users will be unable to access the latest tx data through our explorer. Their transactions may even be shown as 404.

Proposal

Maybe we could change the implementation in usecase/syncstrategy/window.go.

Still in the above example, if block-1 is failed to retrieve, then we only return the block-0.

If we have 50 requests and the block-10 request failed, then we will return the block-0 ~ block-9 data.

The text was updated successfully, but these errors were encountered:

tomtau · 2021-12-08T02:49:11Z

not sure if it helps with this issue, but one potential thing to consider is the "new" PostgreSQL option in the Tendermint configuration: https://github.com/tendermint/tendermint/blob/db6e031a16e25f9f957c03618bfb5b4b98b42c0c/docs/app-dev/indexing-transactions.md#postgresql
with that, I assume the model can be more "push-based" instead of "pull-based", i.e. the full node will directly write some source events in the chain-indexing's DB instead of calling the node's JSON-RPC to retrieve them

ysong42 · 2021-12-14T01:13:13Z

After another discussion with Leo, here is more context of this issue:

This issue hasn't received any complaints from users yet. This is more of a potential issue that may cause confusion on the user side.

Leo agreed that the root cause is not on the indexing server. And the root cause on blockchain node at the moment is still unknown. Now DevOps team is adding more machines to internal nodes.

The interesting thing is, when using only one node, it seems never lag behind. Only when using 2 or 3 nodes, some of them will lag behind. To let the lagged behind node come back, DevOps team will need to manually restart it (take a few minutes) or wait for it to recover itself (not sure how long it takes).

Let's see how it goes. If after more machines adding to internal nodes, this issue still exists, we could have another round of discussion.

ysong42 changed the title ~~Problem: usecase/syncstrategy/window.go one failure fails all API request~~ Problem: usecase/syncstrategy/window.go one failure API request fails all API requests Dec 8, 2021

calvinaco assigned allthatjazzleo, ysong42 and davcrypto Dec 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem: `usecase/syncstrategy/window.go` one failure API request fails all API requests #639

Problem: `usecase/syncstrategy/window.go` one failure API request fails all API requests #639

ysong42 commented Dec 7, 2021 •

edited

Loading

tomtau commented Dec 8, 2021

ysong42 commented Dec 14, 2021 •

edited

Loading

Problem: usecase/syncstrategy/window.go one failure API request fails all API requests #639

Problem: usecase/syncstrategy/window.go one failure API request fails all API requests #639

Comments

ysong42 commented Dec 7, 2021 • edited Loading

Problem

Proposal

tomtau commented Dec 8, 2021

ysong42 commented Dec 14, 2021 • edited Loading

Problem: `usecase/syncstrategy/window.go` one failure API request fails all API requests #639

Problem: `usecase/syncstrategy/window.go` one failure API request fails all API requests #639

ysong42 commented Dec 7, 2021 •

edited

Loading

ysong42 commented Dec 14, 2021 •

edited

Loading