You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@allthatjazzleo Reported that sometimes, our explorer is lagging behind.
Sometimes, some of our internal nodes lagged behind (block data source for indexing server). Then the indexing server failed to pull the latest blocks, and it caused the information on explorer was not the latest.
At the same time, some users were directed to explorer to check their transactions status (maybe through Defi wallet), and they can't find their transactions there, which was expected as the indexing server lagged behind.
Problem
When we use WindowSyncStrategy, (usecase/syncstrategy/window.go), we will pull block data in a batch manner.
However, if any of the blockchain nodes we are requesting are lagged behind, the pulling will be stuck.
Imagine we have three blockchain nodes behind the load-balancer.
Node-1 is good, API request always return 200.
Node-2 is lagged behind, API request to it will return 5XX error.
Say the Window size is 3. Then the indexing in one round will try to fetch 3 blocks.
It is possible some requests are directed to the lagging blockchain node, and it return 5XX error.
Request-to-block-0: sent to Node-1, 200
Request-to-block-1: sent to Node-2, 5XX
Request-to-block-2: sent to Node-1, 200
Although block-0 and block-2 data is fetched, the Window will ignore them and return an error.
In our current setting, we have 3 blockchain nodes, and the window size is 50. Therefore, when one blockchain node is down or lag behind, it is very likely the pulling of block data will be blocked. As it is very likely one request among the 50 requests will sent to the lagged node, which blocks the projection.
Sometimes it could take a few minutes for the nodes to come back. During that time, users will be unable to access the latest tx data through our explorer. Their transactions may even be shown as 404.
Proposal
Maybe we could change the implementation in usecase/syncstrategy/window.go.
Still in the above example, if block-1 is failed to retrieve, then we only return the block-0.
If we have 50 requests and the block-10 request failed, then we will return the block-0 ~ block-9 data.
The text was updated successfully, but these errors were encountered:
ysong42
changed the title
Problem: usecase/syncstrategy/window.go one failure fails all API request
Problem: usecase/syncstrategy/window.go one failure API request fails all API requests
Dec 8, 2021
After another discussion with Leo, here is more context of this issue:
This issue hasn't received any complaints from users yet. This is more of a potential issue that may cause confusion on the user side.
Leo agreed that the root cause is not on the indexing server. And the root cause on blockchain node at the moment is still unknown. Now DevOps team is adding more machines to internal nodes.
The interesting thing is, when using only one node, it seems never lag behind. Only when using 2 or 3 nodes, some of them will lag behind. To let the lagged behind node come back, DevOps team will need to manually restart it (take a few minutes) or wait for it to recover itself (not sure how long it takes).
Let's see how it goes. If after more machines adding to internal nodes, this issue still exists, we could have another round of discussion.
@allthatjazzleo Reported that sometimes, our explorer is lagging behind.
Sometimes, some of our internal nodes lagged behind (block data source for indexing server). Then the indexing server failed to pull the latest blocks, and it caused the information on explorer was not the latest.
At the same time, some users were directed to explorer to check their transactions status (maybe through Defi wallet), and they can't find their transactions there, which was expected as the indexing server lagged behind.
Problem
When we use
Window
SyncStrategy
, (usecase/syncstrategy/window.go
), we will pull block data in a batch manner.However, if any of the blockchain nodes we are requesting are lagged behind, the pulling will be stuck.
Imagine we have three blockchain nodes behind the load-balancer.
Say the Window size is 3. Then the indexing in one round will try to fetch 3 blocks.
It is possible some requests are directed to the lagging blockchain node, and it return 5XX error.
Although block-0 and block-2 data is fetched, the
Window
will ignore them and return an error.In our current setting, we have 3 blockchain nodes, and the window size is 50. Therefore, when one blockchain node is down or lag behind, it is very likely the pulling of block data will be blocked. As it is very likely one request among the 50 requests will sent to the lagged node, which blocks the projection.
Sometimes it could take a few minutes for the nodes to come back. During that time, users will be unable to access the latest tx data through our explorer. Their transactions may even be shown as 404.
Proposal
Maybe we could change the implementation in
usecase/syncstrategy/window.go
.Still in the above example, if block-1 is failed to retrieve, then we only return the block-0.
If we have 50 requests and the block-10 request failed, then we will return the block-0 ~ block-9 data.
The text was updated successfully, but these errors were encountered: