-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latency profile of broadcast vs mpmc #27
Comments
Can you post the entire code? This is somewhat expected on an oversubscribed system due to the notification method differences that must happen between broadcast and mpmc. The broadcast is mostly built for systems where the consumers have dedicated cores and are busy waiting |
Yes, will do. It will take a day or so since need to pull out some proprietary code. In the meantime, I have done a lot more testing, and it's fair to say results are all over the map. These are results for the same two approaches, but on a VPS w/ 4 "dedicated" cores (8 w/ hyperthreads), lowlatency kernel, So, slightly better performance from broadcast. Both had cataclysmic interrupts on one occasion over two hours (3ms, 9ms), which aren't included. I selected a calm hour for each (I figure the system is at fault there). One big change was initially I had the queue size at 4096. I read in an LMAX faq that smaller size allows the entire buffer to stay in cache, and the results above are for a queue size of 8. It might be a good idea to include some guidelines in the docs on this. I think your examples are for size 4, but it didn't occur to me at the time I should really set it in that range. But the biggest takeaway is these tests are very sensitive to the machine they are being run on. It's pretty finicky. More tests = more doubt. Would be interested in any input you can provide on best practices. I'll post the test code when I can untangle it a bit. Thanks! |
I think this helps explain it! The thing with the broadcast queue is that no writer can ever write past a fallen-behind subscriber (the existence of destructors makes that trickier). With a small queue size, that makes it much easier for a reader to fall behind and block writers. |
here is a working version of the benchmark code: https://gist.github.com/jonathanstrong/d88d474f5b0ea147e5bc98d9df0910a2 |
@jonathanstrong Any chance you can run the latency benchmarks on a system with a |
@ProfFan apologies, should have mentioned it: tests were run on the lowlatency kernel, Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-128-lowlatency x86_64). If there are any other system-level improvements you would suggest, would definitely be interested, good information is hard to find. |
@jonathanstrong Debian has a I think this would be pretty interesting especially for robotics and embedded developers (like me 😄 ) |
In latency benchmarks for scenarios with many-producers, and 4-12 consumers, I found the broadcast queue to have much worse performance at the tail end than separate mpmc queues that producers send to consecutively.
This "parallel mpmc" pattern works like this (rough example):
Broadcast followed the doc examples, including calling
unsubscribe
on the unusedBroadcastReceiver
, but consumers are reading anew_stream
from the same queue instead of their own personal queue.In a 40 producer, 6 consumer test, the two setups have nearly identical latency profiles up to the 99.9% lie. But after that, the broadcast queue fares much worse:
Caveats:
hdrhistogram
crate.Questions
Thanks!
The text was updated successfully, but these errors were encountered: