-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peculiar waves of disconnects on VM hosting 100 light clients. #182
Comments
I've created a new network test to gather more information.
|
One ckb full node can accept up to 125 - 8 = 117 connections by default configuration: Considering the small number of online full nodes on testnet, I think it's normal to have a small number of light client nodes that don't connect after you've started 100 light client.
bootnode does have some logic to drop connections periodically, but from the logs you provided, the behavior is not quite the same as this drop policy, we need to investigate a bit more. |
I am still seeing waves of disconnects in the new test environment. The 100 light clients are restricted from having internet access, but they have a perfect connection to the four local testnet full nodes since they all reside in different VMs on a single host computer. In the log snippet below you can see each time the monitor starts a scan. The first two report no issues, meaning all 100 light clients have at least 2 peer connections. Then in the third scan a minute later, there is a wave of connection drops by 68 of the 100 nodes. 20240122 14:23:04 [INFO] Scan start.
20240122 14:24:16 [INFO] Scan start.
20240122 14:25:28 [INFO] Scan start.
20240122 14:25:39 [INFO] There are 64 clients with 0 peers: 6, 14, 15, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 58, 59, 60, 61, 62, 63, 64, 66, 69, 70
, 71, 72, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 95, 96
20240122 14:25:39 [INFO] There are 4 clients with 1 peer: 8, 37, 54, 99 The monitor log reflects many events like this. Resource utilization all looks normal. No firewalls are installed. Monitor Log: Light Client Logs: Full Node Logs: Config Files: |
I increase the amount of testnet light clients on my VM from 10 to 100. All clients are running the same configuration and are started at the same time. I also put together a small monitoring program that checks for the following:
It then sleeps for 60s before repeating.
The are a few interesting things that are seen in the logs:
In occurs to me that the testnet only has 34 full nodes online according to the node probe. Is there any logic in the full nodes that would cause ban waves if there are too many connections coming in?
Example config file.
testnet99.toml.txt
Two days of monitor logs:
output.log
The text was updated successfully, but these errors were encountered: