Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peculiar waves of disconnects on VM hosting 100 light clients. #182

Open
jordanmack opened this issue Jan 18, 2024 · 3 comments
Open

Peculiar waves of disconnects on VM hosting 100 light clients. #182

jordanmack opened this issue Jan 18, 2024 · 3 comments

Comments

@jordanmack
Copy link

I increase the amount of testnet light clients on my VM from 10 to 100. All clients are running the same configuration and are started at the same time. I also put together a small monitoring program that checks for the following:

  • Check for any clients that are offline (not responding).
  • Check for any clients with less than 2 peers.
  • Check for any clients that report a tip lagging behind the others by more than 30 blocks.

It then sleeps for 60s before repeating.

The are a few interesting things that are seen in the logs:

  • There are three clients in particular that seem to have problems staying connected to at least two peers: 35, 83, 99
  • There are recurring waves of 10 or more clients that drop connection at the same time.

In occurs to me that the testnet only has 34 full nodes online according to the node probe. Is there any logic in the full nodes that would cause ban waves if there are too many connections coming in?

Example config file.
testnet99.toml.txt

Two days of monitor logs:
output.log

@jordanmack
Copy link
Author

I've created a new network test to gather more information.

  • 4 local testnet full nodes.
  • Full nodes can access the internet normally.
  • 100 local light clients.
  • Light clients can access the local full nodes but not the internet.

@quake
Copy link
Member

quake commented Jan 19, 2024

One ckb full node can accept up to 125 - 8 = 117 connections by default configuration:
https://github.com/nervosnetwork/ckb/blob/develop/resource/ckb.toml#L83-L84

Considering the small number of online full nodes on testnet, I think it's normal to have a small number of light client nodes that don't connect after you've started 100 light client.

Is there any logic in the full nodes that would cause ban waves if there are too many connections coming in?

bootnode does have some logic to drop connections periodically, but from the logs you provided, the behavior is not quite the same as this drop policy, we need to investigate a bit more.

@jordanmack
Copy link
Author

I am still seeing waves of disconnects in the new test environment. The 100 light clients are restricted from having internet access, but they have a perfect connection to the four local testnet full nodes since they all reside in different VMs on a single host computer.

In the log snippet below you can see each time the monitor starts a scan. The first two report no issues, meaning all 100 light clients have at least 2 peer connections. Then in the third scan a minute later, there is a wave of connection drops by 68 of the 100 nodes.

20240122 14:23:04 [INFO] Scan start.
20240122 14:24:16 [INFO] Scan start.
20240122 14:25:28 [INFO] Scan start.
20240122 14:25:39 [INFO] There are 64 clients with 0 peers: 6, 14, 15, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 58, 59, 60, 61, 62, 63, 64, 66, 69, 70
, 71, 72, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 95, 96
20240122 14:25:39 [INFO] There are 4 clients with 1 peer: 8, 37, 54, 99

The monitor log reflects many events like this. Resource utilization all looks normal. No firewalls are installed.

Monitor Log:
monitor-log.tar.gz

Light Client Logs:
client-logs.tar.gz

Full Node Logs:
full-node-2.tar.gz
full-node-3.tar.gz
full-node-4.tar.gz
full-node-5.tar.gz

Config Files:
testnet-base.toml.txt
ckb.toml.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants