-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed config reloads in Logstash causes resources leaks #16202
Comments
Fly-by thoughts:
I think that the "easiest" way to handle this would be to also clear them as a part of starting up, but we would need to be careful to ensure that if a pipeline has overlap during a restart that we keep ordering correct. Alternatively, we could change the default id generation algorithm to somehow fingerprint the source of the plugin. |
Hey @yaauie, thank you for the input! I'm wondering if changing the def shutdown(reloading: false)
return if finished_execution? && !reloading
end Then on the old_pipeline.shutdown(reloading: true) My initial tests seems to be working with this code change, but I'm still unsure if it can have any other impact on the reloading process. WDYT? |
🤔 The comments lead me to believe that we guard on I think that if we change the scope of the guard a little, we can get the behaviour we want in both cases without needing to add the
|
Hi @yaauie, If I'm not mistaken, It would only solve the pipeline metrics issue, given the For example: Let's assume the following pipeline started successfully and both plugins are running:
Then, the user changes the But then, the user changes it again, but the configuration is still invalid. This time, when Logstash tries to reload the pipeline configuration, the |
The scenario I was able to reproduce consistently has to do with resources created during the register method of input plugins. To reproduce I took simple inputs like heartbeat and generator and added a resource creation (connection) during register, and added an conditional exception to one of them. With this I could trigger the resource pile up. The proposed solution fixes this scenario, and I don't see any problems it could create. In this case we're sort of abusing |
In the meantime the fix proposed for cleaning up metrics resources is still valid and can be done regardless of the fix for the input register leaks |
Logstash information:
Please include the following information:
Description of the problem including expected versus actual behavior:
Logstash is not properly handling failed configuration reloads.
When Logstash tries to reload the pipeline configuration and it fails (due to some issue, e.g database down), subsequent reloads causes resource leaks, as no attempt to stop the input plugins is done. It also keeps the old pipeline's plugin in the metric store, filling the memory and returning outdated resources on the stats API.
The issues seems to be on the java_pipeline.rb#shutdown logic. When the reload fails, the
finished_execution?
is true, so it skips thestop_inputs
,wait_for_shutdown
andclear_pipeline_metrics
methods execution, leaking those resources and increasing them linearly.For example, given theIf pipeline:
If the pipeline configuration reload fails due to
db_2
being unavailable, thedb_1
connections won't be closed, and it will duplicate the resource usage (number of connections) in every failed reload attempt. It will also keep both pipelines "old" and "new" plugins on the metric store, increasing the number of elements per retry.Steps to reproduce:
show processlist
) and plugins metrics:curl -s http://localhost:9600/_node/stats | jq -r '.pipelines."<pipeline_name>".plugins.codecs | length'
The text was updated successfully, but these errors were encountered: