Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate netty boss and worker groups to improve the graceful shutdown. #178

Merged

Conversation

mashhurs
Copy link
Contributor

@mashhurs mashhurs commented Aug 27, 2024

Description

We have figured out from the best practise with beats (PR) and tcp (source) that when shutting down the plugin, closing the boss group is a right approach to make sure netty no longer pushes tasks into its executor event loop queue.
With this change, we separate worker and executor event loop groups so that when shutting down separately terminating the groups will provide a benefit of not accepting connections anymore. Although this change applies no accepting connections, there may be still tasks in the netty task queue and plugin can experience exceptional case, such as InterruptedException (the cause seems to me: differently from beats-input and tcp-input, http-input plugin utilizes thread pools and waits for 5s while terminating, if we used netty event executor group, we would get RejectedExecutionException ).

[2024-08-26T21:35:58,056][ERROR][logstash.inputs.http     ][main][305e9313f95ee3a5a7f62cb73ee7458b785402852defd4049630e3b0d46f5f37] unable to process event. {:message=>"", :class=>"Java::JavaLang::InterruptedException", :backtrace=>["java.base/java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly(ReentrantLock.java:159)", "java.base/java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:372)", "java.base/java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:367)", "org.logstash.ext.JrubyMemoryWriteClientExt.doPush(JrubyMemoryWriteClientExt.java:61)", "org.logstash.ext.JRubyWrappedWriteClientExt.lambda$push$0(JRubyWrappedWriteClientExt.java:113)", "org.logstash.instrument.metrics.timer.ConcurrentLiveTimerMetric.time(ConcurrentLiveTimerMetric.java:47)", "org.logstash.ext.JRubyWrappedWriteClientExt.lambda$executeWithTimers$2(JRubyWrappedWriteClientExt.java:148)", "org.logstash.instrument.metrics.timer.ConcurrentLiveTimerMetric.time(ConcurrentLiveTimerMetric.java:47)", "org.logstash.ext.JRubyWrappedWriteClientExt.lambda$executeWithTimers$3(JRubyWrappedWriteClientExt.java:148)", "org.logstash.instrument.metrics.timer.ConcurrentLiveTimerMetric.time(ConcurrentLiveTimerMetric.java:47)", "org.logstash.ext.JRubyWrappedWriteClientExt.executeWithTimers(JRubyWrappedWriteClientExt.java:148)", "org.logstash.ext.JRubyWrappedWriteClientExt.push(JRubyWrappedWriteClientExt.java:113)", "Users.mashhur.Dev.elastic.ls_minus_plugins.logstash_minus_input_minus_http.lib.logstash.inputs.http.RUBY$method$push_decoded_event$0(/Users/mashhur/Dev/elastic/ls-plugins/logstash-input-http/lib/logstash/inputs/http.rb:261)", "Users.mashhur.Dev.elastic.ls_minus_plugins.logstash_minus_input_minus_http.lib.logstash.inputs.http.RUBY$block$decode_body$0(/Users/mashhur/Dev/elastic/ls-plugins/logstash-input-http/lib/logstash/inputs/http.rb:243)", "org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:151)", "org.jruby.runtime.MixedModeIRBlockBody.yieldDirect(MixedModeIRBlockBody.java:111)", "org.jruby.runtime.BlockBody.yield(BlockBody.java:106)", "org.jruby.runtime.Block.yield(Block.java:189)", "org.jruby.ir.runtime.IRRuntimeHelpers.yield(IRRuntimeHelpers.java:494)", "org.jruby.ir.instructions.YieldInstr.interpret(YieldInstr.java:96)", "org.jruby.ir.interpreter.StartupInterpreterEngine.processOtherOp(StartupInterpreterEngine.java:155)", "org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:98)", "org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:82)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:201)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:188)", "org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:471)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:259)", "org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:338)", "org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:66)", "org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:82)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:201)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:188)", "org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:471)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:259)", "org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:270)", "org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:337)", "org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:66)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:128)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:115)", "org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:446)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:92)", "org.jruby.ir.instructions.CallBase.interpret(CallBase.java:548)", "org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:363)", "org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:66)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:128)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:115)", "org.jruby.gen.LogStash$$Inputs$$Http$$MessageHandler_1858188193.onNewMessage(org/jruby/gen/LogStash$$Inputs$$Http$$MessageHandler_1858188193.gen:13)", "org.logstash.plugins.inputs.http.MessageProcessor.processMessage(MessageProcessor.java:77)", "org.logstash.plugins.inputs.http.MessageProcessor.run(MessageProcessor.java:62)", "java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)", "java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)", "java.base/java.lang.Thread.run(Thread.java:833)"]}

How to test?

I honestly don't know how to better test but I have used vector.dev to send concurrent requests and reloaded http-input pipeline. As a result, InterruptedException decreased that without change incoming connections may still push to tasks to netty internal queue.

Again, the current change you can still see the InterruptedException exception (as we have similar case with input-beats that codec becomes unavailable) but this exception will not be flooded due to the fact we no longer accept incoming connections with the boss group shutdown.

// boss group is responsible for accepting incoming connections and sending to worker loop
// process group is channel handler, see the https://github.com/netty/netty/discussions/13305
// see the https://github.com/netty/netty/discussions/11808#discussioncomment-1610918 for why separation is good
bossGroup = new NioEventLoopGroup(1, daemonThreadFactory("http-input-connector"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review note: I think single thread for 128 backlog should be okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we took offline discussion, I have kept current thread config but added TODO to follow up:

  • default socket backlog is 1024, so which default acceptor thread count is better suite?
  • do we need to make it configurable or not? - sometimes providing internal config also makes user environment hard

@@ -73,7 +80,9 @@ public void run() {
public void close() {
try {
// stop accepting new connections first
processorGroup.shutdownGracefully(0, 10, TimeUnit.SECONDS).sync();
bossGroup.shutdownGracefully().sync();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review note: As my understanding, ServerBootstrap internally uses SingleThreadEventExecutor and with 0 quiet period, netty may directly terminate the tasks, so let's align with default to get better user experience.

Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to generate (at the moment just locally) some decent load to the HTTP input using this https://github.com/andsel/gatling_test but the shutdown of Logstash always terminated fast and cleanly. Could you give more details how you set up https://vector.dev to load Logstash and check the error?

// boss group is responsible for accepting incoming connections and sending to worker loop
// process group is channel handler, see the https://github.com/netty/netty/discussions/13305
// see the https://github.com/netty/netty/discussions/11808#discussioncomment-1610918 for why separation is good
bossGroup = new NioEventLoopGroup(1, daemonThreadFactory("http-input-connector"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Tried to reproduce locally but wasn't able. However the changes looks good to me! 👍

@mashhurs mashhurs merged commit aba42b3 into logstash-plugins:main Aug 28, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate boss and worker group in ServerBootstrap.
3 participants