-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: avoid contention on scheduler execution #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the ideas LGTM, have left only a (maybe) stylistic concern about the usage of nil
as special case param.
|
||
# @overload | ||
def work_threads(query = :all) | ||
if query.nil? # our special case from JobDecorator#start_work_thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are using the nil
value as a special value, we are assigning to it a meaning. work_threads(nil)
is invoked only by the patched methods in JobDecorator
, could we use something really specific and explicit for this?
Maybe adding a new label :os_list_threads
or something similar instead of nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, only used from JobDecorator
- otherwise it's usually called with no args or a selector.
I wanted to have smt of a different type than the :symbol
it supports, will try smt better than nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
What we're after here is to avoid
Thread.list
blocking operation, which in JRuby means walking a synchronized Map.A lot of Ruby threads in the system trying to
Thread.list
will lead to thread contending (blocked) for execution ...Rufus::Scheduler
usesThread.list
when figuring out which are the worker/scheduler threads it started.This is working but a bit unfortunate and might be the cause for issues such as #88 (if there are multiple scheduler instances started from plugins - e.g. multiple jdbc inputs or http_poller or elasticsearch inputs with a
schedule => ...
).Implementation wise there's 3 changes that improve performance:
less scheduler thread ticks (sleep for 1.0 instead of 0.3)
avoid timeout_jobs doing a
Thread.list
if the timeout feature is not used (the plugin does not use timeouts)NOTE: this accounts for most of the improvements ...
finally there's still a
list
operation whenever the job is about to be executedNOTE: the reason we re-def
worker_threads
and cache them - there's no more locking operations (the remaining Random contention is due to usingKernel.rand
in a local testing scenario and does not happen in production).The recording screenshots are from a local emulated run, which is a bit far fetched: 600 schedulers with a 10s interval cron schedule. Still do not have a good enough explanation for a complete scheduler stop, what I was able to accomplish locally is some schedulers skipping one, two, sometimes three four (10 second) runs.
For now we're not sure whether there is another bug with
Rufus::Scheduler
(3.0.9) that would cause #88.The plan is to have the patch tested with potentially many JDBC inputs, if the input stopping issue persist we might need to investigate further. Upon succession the
< Scheduler
sub-class should be extracted into it's own mixin to be usable by all plugins (not just JDBC), ideally the rufus-scheduler would also be updated to latest version.The overrides should be iterated upon to avoid using
Thread.list
in the library completely and submitting a patch upstream.