Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscribe to topic partition list #233

Open
fxposter opened this issue Dec 10, 2022 · 8 comments
Open

Subscribe to topic partition list #233

fxposter opened this issue Dec 10, 2022 · 8 comments
Assignees
Labels
consumer Consumer API related stuff enhancement

Comments

@fxposter
Copy link

Currently when you want to subscribe to a topic - you are forced to subscribe to "default" offset and partitions, as in Consumer#subscribe you only can specify topic names. The underlying rd_kafka_subscribe accepts TopicPartitionList that allows choosing partitions/offsets if needed. I'd like to have a way to either subscribe to a TPL directly or have a way to pass more data to Consumer#subscribe, so that things like "explicitly consume from particular offset" could be easily done.

WDYT? What API would you like to have in the library for this?

@fxposter
Copy link
Author

To not create a separate subscribe_list method, I'd propose something like this:

def subscribe(*topics_or_tpl)
  if topics_or_tpl.length == 1 && topics_or_tpl.first.is_a?(TopicPartitionList)
    tpl = topics_or_tpl.first.to_native_tpl
  else
    # construct tpl the old way
  end
  # ...
end

@fxposter
Copy link
Author

@mensfeld WDYT?

@mensfeld
Copy link
Member

Subscribe to topic set using balanced consumer groups.

You can subscribe to the topic and use the rebalance_cb to set expected offsets.

I know that librdkafka does not have pluggable assignment strategies (ref: confluentinc/librdkafka#2284) and this could potentially partially mitigate this.

I'm hesitant to recommend a separate method for TPL-based assignments like subscribe_via_tpl but not sure.

Aside from that: I would opt to make it fail-safe. That is, making sure that only valid TPLs can be used and if not, adding some sort of notifications around that. Doing it that way requires getting the topics metadata from the cluster in a non-cached way preferably.

Overall I do see it as an advanced use-case that this library should support. Is it a high-priority one? Def. not to me.

@fxposter
Copy link
Author

But this is what rdkafka actually allows you to set in it's example. The main usecase for me is to set offset to the beginning of the partition explicitly upon subscribe. I don't necessarily see it as being advanced :)

@mensfeld
Copy link
Member

mensfeld commented Dec 11, 2022

The main usecase for me is to set offset to the beginning of the partition explicitly upon subscribe

I was referring to building a custom assignment flow per partition to load-balance processes subscriptions.

The usecase of setting it explicitly upon the first usage differently per topic def. should be supported.

It is still not the top priority for me but if we all agree on the API (cc @thijsc ) I don't see any reason not to work on it together :)

@fxposter
Copy link
Author

I'm totally fine writing it myself, just want to agree on API beforehand.
From usability standpoint def subscribe(*topics_or_tpl) where there may be multiple topics or single tpl is the best one. It's backwards compatible and still has good API when using TPL

@mensfeld
Copy link
Member

I agree about usability, I don't like the complexity of handling two sets/types of incoming arguments in the same method.

@mensfeld
Copy link
Member

@fxposter you want to revisit this with me? :) happy to make it move forward

@mensfeld mensfeld self-assigned this Jul 4, 2024
@mensfeld mensfeld added enhancement consumer Consumer API related stuff labels Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consumer Consumer API related stuff enhancement
Development

No branches or pull requests

2 participants