-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Validator regularly OOMs on Kusama #2276
Comments
Thanks for reporting! We have few ideas what to check |
Hey @Lederstrumpf can you please check if #2278 improves situation? |
Thanks switched to it, but #2280 blocks this atm (happens both on |
related: To debug #2280, I'm currently resyncing from a db backup that's trailing by 14 days atm. This also OOMs (on the same 64GB RAM node) in non-validator mode:
|
Bug Summary
Validator regularly runs out of memory on Kusama
Bug Description
If in active set, Kusama validator regularly runs out of memory.
This happens both on current
master
(153ed85bcc3dda1f6e1fc2cb0efcca33e8e49aa1
) and on therefactor/notifications
(b5834724b3f18b1d694051cae1afff06d1ee7652
, PR #2269) branches, although it happens less regularly on the former (~ every hour or two) than the latter (~ every 10 minutes), but it also accumulates far fewer era points on the former (~ dozens per era), than on the latter (~ hundreds per era), so this may be due to the latter resolving networking issues and thereby enabling the validator to engage in consensus more actively.This happens on a host with 64 GB RAM. If this is insufficient for operating a Kagome validator, please close this issue.
Steps to Reproduce
Mode: Validator
number of nodes: 1
Command:
kagome --chain kusama -d [...] --validator --listen-addr [...] --public-addr [...] --name [...] --rpc-port [...] --telemetry-url [...] --telemetry-url [...] --node-key-file [...]
Effects of the Bug
The node gets culled by the OOM killer.
An example log from current
master
(153ed85bcc3dda1f6e1fc2cb0efcca33e8e49aa1
):An example log from current
refactor/notifications
(b5834724b3f18b1d694051cae1afff06d1ee7652
):Expected Behavior
The validator does not OOM? ^^
System Information
NixOS 24.5 with kernel 6.11.5
Compiler: gcc 13.2.0
CMake: cmake version 3.25.3
Built using flake from #2257
Relevant for this issue: host has 64 GB RAM.
Additional Context
Issue does not occur when running without
--validator
flag.The text was updated successfully, but these errors were encountered: