-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Producer.produceAsync performance #531
Comments
Thanks for your feedback @lukestephenson. I'd say dropping the |
@svroonland FYI, I was asking questions on this blocking subject here: #492 |
@guizmaii Ah that |
@adamgfraser Perhaps you could comment on the performance of |
@lukestephenson Ran your benchmark, below are results before and after removing attemptBlocking Before:
After:
|
Thanks for looking into this. Those results are similar to what we saw when removing |
I took a further look into performance after switching to It appears to be related to the performance of the promise implementation. Both the ZIO kafka and monix implementation rely on a Promise like contract (different implementations) to signal when the work has completed.
For zio it looks like:
In terms of performance, the Monix implementation averages 656ms, while the zio implementation takes 3100ms. While the comparison is a bit naive / limited, it roughly correlates to the difference I see with my original timings between the kafka producing performance for monix / zio. |
I think @adamgfraser would want to see this Promise performances comparison |
FYI zio/zio#7557 |
@lukestephenson Could you maybe re-run your perfs tests with this version of ZIO: To use a snapshot release you need to do this, in your ThisBuild / resolvers ++= Resolver.sonatypeOssRepos("snapshots")
libraryDependencies += "dev.zio" %% "zio" % "2.0.4+19-064d1cd7-SNAPSHOT" This is the version released with the Promise optimisation fix I mentioned previously (#531 (comment)) |
The promise performance on The kafka performance test has seen a similar reduction in time with the improvements to promise. So with the improvements to promise and removing the use of |
@svroonland I think the implemented solution in #555 is wrong. Is it a safe to drop blocking given |
What do you think about using a dedicated thread pool? Using the Blocking one isn't appropriate as it can be slow. WDYT? |
Ah that's really annoying, that methods that are explicitly documented to be asynchronous can still block, but you're right, it looks like there's potential blocking while waiting for metadata. I believe the actual shifting from the current ZIO thread to another thread (from the blocking pool in this case) is causing the slowdown, not using the blocking thread pool per se. So using a dedicated thread pool won't help, I think. Perhaps we can build your earlier idea of having a ZIO buffer ( |
Why aren't we just running this whole thing on the blocking thread pool? Generally the biggest cost is in shifting between executors. Running a whole bunch of operations on the blocking executor and shifting there and back around the whole thing is dramatically faster than shifting there and back around every individual workflow. |
@adamgfraser If I understand you correclty, I think that's what I'm proposing; a send loop/stream running on the blocking pool and a queue to communicate with the ZIO thread pool side. I hope to have some time to implement that later today. |
@lukestephenson Would it be possible for you to remove the dependency on |
@guizmaii I'll remove the internal dependencies now |
I've just pushed the kafka-perf. Appologies for the delay. |
Thanks Luke ❤️ |
Shall we close this issue? Improvements were implemented in zio/zio#7557, #555 and #581 |
Firstly, thanks for all the fixes / effort investigated ❤️ I've updated the https://github.com/lukestephenson/kafka-perf repo to use the latest zio / zio-kafka dependencies and performance is much improved. Note however, to see good performance in my case I had to bump the value of Here are the latest timings I collected (please don't compare to be original timings when I raised the issue, I'm running on a different laptop).
With regards to the unchunked comparisons, originally ZIO was 6.6 times slower than my Monix implementation (23 vs 3.5 seconds). With the latest run, ZIO is only 2.4 times slower (using a sendBufferSize of 1mb - 3.4 vs 1.4 seconds). Things have improved heaps, but it still appears to be a lot slower than the comparable Monix implementation. I'm happy for this issue to be closed. It's probably confusing to have the discussion continue any longer on this issue, so if we did want further improvements we can always raise another issue. I suspect the slowness is because each message still needs to be pushed through a ZIO stream, and it isn't great with small chunk sizes. Thanks again! |
@lukestephenson Thanks again for all of this! 🙏 I know that the ZIO team is re-implementing Streams on a new idea to improve performances (see Adam Fraser's talk at FunScala London 2022 (I don't know if it's published on YT yet)). They told me that they wanted to deliver this new implementation in March/April for the next ZIO World event (https://www.eventbrite.com/e/zio-world-2023-lisbon-portugal-tickets-492267383997) Meanwhile, I think the next step for us regarding performance will first be to implement a few benchmarks. I don't mind keeping this issue open. I think it contains a good conversation we can continue to iterate on with future improvements. |
Renamed the issue because the previous title did not accurately reflect the latest performance comparison and can give false impressions to people not reading this entire thread. |
@lukestephenson What are the latest performance numbers with ZIO 2.1.x? I get the impression that we much closer to Monix' performance numbers now. But we'll probably never get there entirely since Monix used push-based streams and ZIO streams are pull based. I am beginning to think we should close this issue. |
I was making use of
produceAsync
in combination withZIO.foreach
to achieve batching / sending multiple records in parallel.While I was expecting this to be slightly slower than the
produceChunkAsync
API, the order of magnitude of which it is slower was not.For my experiment, I was trying to publish 1 million messages to Kafka (just a single broker / local docker). The timings I saw were:
For the monix, fs2-kafka implementations, I tried to keep it as close to ZIO-kafka as I could.
Note the fs2-kafka batch API is implemented in terms of the single publish API (and a
sequence
), so will be basically the same.The monix implementation (our internal publisher lib, so I can't share, but I'll try to copy the relevant bits into the demo project soon) makes use of
scala.concurrent.blocking
to flag that calling the underlyingKafkaProducer.send
method may block.One of my colleagues played with the zio-kafka implementation, and found that the majority of the performance hit is from
ZIO.attemptBlocking
.For a properly configured KafkaProducer, the majority of calls to
send
shouldn't block, but the current implementation appears optimised for all the calls blocking, and always shifting execution onto another thread.The code I used for the experiment is in https://github.com/lukestephenson/kafka-perf.
Now, you might be thinking just use the chunking API and the problem goes away. Well yes to a certain extent. But regardless of this I do have serious concerns about just how much overhead using ZIO imposes. Also, the current chunking API doesn't support what we need. Our project has the ability to drop messages if publishing to kafka fails (eg message too large). The zio-kafka bulk publish API doesn't tell you which message failed (just that something in the batch failed), or have the ability to continue with the other messages past that point. So we would first need to improve that. Regardless of that, I don't think the
produceAsync
method should have overheads as high as it does currently.The text was updated successfully, but these errors were encountered: