-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate netty boss and worker groups to improve the graceful shutdown. #178
Separate netty boss and worker groups to improve the graceful shutdown. #178
Conversation
// boss group is responsible for accepting incoming connections and sending to worker loop | ||
// process group is channel handler, see the https://github.com/netty/netty/discussions/13305 | ||
// see the https://github.com/netty/netty/discussions/11808#discussioncomment-1610918 for why separation is good | ||
bossGroup = new NioEventLoopGroup(1, daemonThreadFactory("http-input-connector")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review note: I think single thread for 128 backlog should be okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't apply the same also to Beats logstash-plugins/logstash-input-beats#500 at the boss group creation https://github.com/logstash-plugins/logstash-input-beats/pull/500/files#diff-f41780ec08fd274c56c0bdf129a08f4ca42214dac2beb109ef6f789a9a349144R57?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we took offline discussion, I have kept current thread config but added TODO to follow up:
- default socket backlog is 1024, so which default acceptor thread count is better suite?
- do we need to make it configurable or not? - sometimes providing internal config also makes user environment hard
@@ -73,7 +80,9 @@ public void run() { | |||
public void close() { | |||
try { | |||
// stop accepting new connections first | |||
processorGroup.shutdownGracefully(0, 10, TimeUnit.SECONDS).sync(); | |||
bossGroup.shutdownGracefully().sync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review note: As my understanding, ServerBootstrap
internally uses SingleThreadEventExecutor
and with 0 quiet period, netty may directly terminate the tasks, so let's align with default to get better user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to generate (at the moment just locally) some decent load to the HTTP input using this https://github.com/andsel/gatling_test but the shutdown of Logstash always terminated fast and cleanly. Could you give more details how you set up https://vector.dev to load Logstash and check the error?
// boss group is responsible for accepting incoming connections and sending to worker loop | ||
// process group is channel handler, see the https://github.com/netty/netty/discussions/13305 | ||
// see the https://github.com/netty/netty/discussions/11808#discussioncomment-1610918 for why separation is good | ||
bossGroup = new NioEventLoopGroup(1, daemonThreadFactory("http-input-connector")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't apply the same also to Beats logstash-plugins/logstash-input-beats#500 at the boss group creation https://github.com/logstash-plugins/logstash-input-beats/pull/500/files#diff-f41780ec08fd274c56c0bdf129a08f4ca42214dac2beb109ef6f789a9a349144R57?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Tried to reproduce locally but wasn't able. However the changes looks good to me! 👍
Description
We have figured out from the best practise with beats (PR) and tcp (source) that when shutting down the plugin, closing the boss group is a right approach to make sure netty no longer pushes tasks into its executor event loop queue.
With this change, we separate worker and executor event loop groups so that when shutting down separately terminating the groups will provide a benefit of not accepting connections anymore. Although this change applies no accepting connections, there may be still tasks in the netty task queue and plugin can experience exceptional case, such as
InterruptedException
(the cause seems to me: differently from beats-input and tcp-input, http-input plugin utilizes thread pools and waits for 5s while terminating, if we used netty event executor group, we would getRejectedExecutionException
).How to test?
I honestly don't know how to better test but I have used vector.dev to send concurrent requests and reloaded http-input pipeline. As a result,
InterruptedException
decreased that without change incoming connections may still push to tasks to netty internal queue.~/.vector
~/.vector/config/
path: https://gist.github.com/mashhurs/7890b44f3dce8a8e454d934adc669641vector --config ~/.vector/config/vector_http_conf.yaml
) and downstream LS (bin/logstash -f your-config-path.conf
)Again, the current change you can still see the
InterruptedException
exception (as we have similar case with input-beats that codec becomes unavailable) but this exception will not be flooded due to the fact we no longer accept incoming connections with the boss group shutdown.ServerBootstrap
. #177