ManagedLedger should streamline the read requests #18852

sijie · 2019-03-12T09:41:18Z

sijie
Mar 12, 2019
Collaborator

Problems

Currently managed ledger read entries in a very large batch requests - 100 entries by default. This is an inefficient approach. We should streamline the read requests like what dlog is doing.

sijie · 2019-03-12T14:05:40Z

sijie
Mar 12, 2019
Collaborator Author

I have seen in a production deployment the consumer can never catch up if bookie's avg read latency is 10+ms bad due to the disk and other workloads running on same machine. the problem can potentially be mitigated if we can have a proper readahead mechanism in managed like what dlog is doing.

0 replies

MarvinCai · 2019-08-28T04:40:42Z

MarvinCai
Aug 28, 2019
Collaborator

Should the logic similar to the readAhead logic here in BKLogSegmentEntryReader

0 replies

sijie · 2019-08-28T04:59:18Z

sijie
Aug 28, 2019
Collaborator Author

@MarvinCai yes. I think it is worth pushing this logic to BK to provide a StreamingReadHandle over the BK ReadHandle. so that the read ahead logic can be reused for both bookkeeper read handle and tiered storage read handle.

0 replies

sijie · 2019-08-28T05:00:06Z

sijie
Aug 28, 2019
Collaborator Author

/cc @jiazhai @eolivelli in this thread. so they can provide some more thoughts around this if it is worth adding this readahead logic to BK read handle.

0 replies

eolivelli · 2019-08-28T10:14:20Z

eolivelli
Aug 28, 2019
Collaborator

@sijie
I agree that pushing that mechanism to the low level API will be useful

0 replies

vicaya · 2019-09-15T03:07:16Z

vicaya
Sep 15, 2019

This is a also blocking issue for practical use of tiered storage as historical retention, since replay from tiered storage (at least s3) is too slow.

It'd be great if number of read-ahead threads and/or outstanding requests is tunable, as many blob/object store scale throughput proportional to number of connections.

0 replies

sijie · 2019-09-15T05:23:04Z

sijie
Sep 15, 2019
Collaborator Author

@vicaya thank you for your feedback. @MarvinCai are you willing to give it a try?

0 replies

MarvinCai · 2019-10-17T02:14:58Z

MarvinCai
Oct 17, 2019
Collaborator

@sijie sorry just saw the replies, how about I start with doc with problem statement and try propose a solution. If everything looks good then we can proceed from there.

0 replies

sijie · 2019-10-28T02:09:49Z

sijie
Oct 28, 2019
Collaborator Author

@MarvinCai Are you working on this issue already?

0 replies

MarvinCai · 2019-10-29T05:47:18Z

MarvinCai
Oct 29, 2019
Collaborator

@sijie I was new to BK code base and was reading the some LeadferHandler and DL codes tog figure out what should be change, have a simple doc about what I think may need to change.
I haven't start writing any code yet. If someone with more experience with BK plan to work on it then I'm fine. Else I'm also glad to help.

0 replies

vicaya · 2020-09-24T23:01:07Z

vicaya
Sep 24, 2020

Is there any progress on this issue? Whenever people ask me about why we don't use tiered storage, I have to point them to this issue for why it's too slow for us (readers cannot read fast enough from tiered storage and backlogs would build up even at moderate throughput (<1000msgs/s with small messages (<100 bytes), which is <100KB/s).

0 replies

eolivelli · 2020-09-25T19:59:40Z

eolivelli
Sep 25, 2020
Collaborator

@nicoloboschi you could be interested in working on a fix for this issue if @MarvinCai doesn't have time to work on this topic

0 replies

michaeljmarshall · 2021-03-18T04:42:40Z

michaeljmarshall
Mar 18, 2021
Collaborator

@MarvinCai, @sijie, @eolivelli - It looks like the feature to improve read throughput for blob storage is still outstanding. I'm very interested in helping to contribute this feature. Do you think it deserves its own issue to start a dedicated discussion for offloading?

0 replies

eolivelli · 2021-03-18T09:09:45Z

eolivelli
Mar 18, 2021
Collaborator

@michaeljmarshall you can start a discussion with a proposal.
you can start on dev@pulsar in order to have more people in the discussion.

0 replies

michaeljmarshall · 2021-03-19T02:34:59Z

michaeljmarshall
Mar 19, 2021
Collaborator

@eolivelli - thanks, I'll do that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ManagedLedger should streamline the read requests #18852

{{title}}

Replies: 15 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

ManagedLedger should streamline the read requests #18852

sijie Mar 12, 2019 Collaborator

Replies: 15 comments

sijie Mar 12, 2019 Collaborator Author

MarvinCai Aug 28, 2019 Collaborator

sijie Aug 28, 2019 Collaborator Author

sijie Aug 28, 2019 Collaborator Author

eolivelli Aug 28, 2019 Collaborator

vicaya Sep 15, 2019

sijie Sep 15, 2019 Collaborator Author

MarvinCai Oct 17, 2019 Collaborator

sijie Oct 28, 2019 Collaborator Author

MarvinCai Oct 29, 2019 Collaborator

vicaya Sep 24, 2020

eolivelli Sep 25, 2020 Collaborator

michaeljmarshall Mar 18, 2021 Collaborator

eolivelli Mar 18, 2021 Collaborator

michaeljmarshall Mar 19, 2021 Collaborator

sijie
Mar 12, 2019
Collaborator

sijie
Mar 12, 2019
Collaborator Author

MarvinCai
Aug 28, 2019
Collaborator

sijie
Aug 28, 2019
Collaborator Author

sijie
Aug 28, 2019
Collaborator Author

eolivelli
Aug 28, 2019
Collaborator

vicaya
Sep 15, 2019

sijie
Sep 15, 2019
Collaborator Author

MarvinCai
Oct 17, 2019
Collaborator

sijie
Oct 28, 2019
Collaborator Author

MarvinCai
Oct 29, 2019
Collaborator

vicaya
Sep 24, 2020

eolivelli
Sep 25, 2020
Collaborator

michaeljmarshall
Mar 18, 2021
Collaborator

eolivelli
Mar 18, 2021
Collaborator

michaeljmarshall
Mar 19, 2021
Collaborator