-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support string indices for operators with by
option
#3108
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Would love this feature! I started working on a plugin just for adding these kinds of operators but if nextflow can do it natively, that'd be so much better. I've seen plenty of issues in nf-core where such channel operations are done using the entire map as the key, hopefully string keys would quickly resolve them. |
I think the Just wanted to say, I found a "fun" example where grouping happens on two maps at different positions 😬 In the below example from the taxprofiler pipeline, position 0 and 2 are .map {
meta, reads, db_meta, db ->
[[id: db_meta.db_name, single_end: meta.single_end], reads, db_meta, db]
}
.groupTuple(by: [0,2,3]) |
@Midnighter , I went ahead and pushed a draft PR, currently only groupMap is implemented. I don't think Also, your issue #3175 is about using a map as a join key, whereas this issue is about using string keys. So your use case should already work, as long as you aren't modifying your maps in place. In others, tuples can already be joined on a map element, but maps cannot be joined because it requires string indices. |
I see, I think I misunderstood the intent of this issue then. You are talking about joining channels that have single map elements only? Yes, I'm mostly interested in joining channels where a map is an element of a tuple. I thought you were working on a solution that would allow setting string keys also in that case. Indeed I think that joining on the entire map is bad practice and joining on chosen keys from that map is much more explicit and will prevent many accidental errors. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I vaguely remember you talking about allowing closures for the |
That was for joining on an entire map, which I got mixed up with this issue. We fixed that one by being careful about modifying maps. Also I realized that joining on a closure doesn't really make sense 😅 This issue is only about joining on string keys instead of integers. I have a draft PR for it but the three operators in question are complicated and it's not a priority right now. Feel free to experiment check out the PR if you want to help, but I think your original issue was solved by other means. |
@bentsherman Why do you say
I came passed here because I was searching for issues as I want to do exactly this. I've seen numerous examples both in our code and others with the pattern:
or similar where I suppose for |
for for |
Yup! Anything not in the key can (and should) be assumed to be distinct. Filtering the list of "column" down to deduplicate should be separate (think SQL or dataframe joins in either R, Python, or Rust). The alternative here is to carry around additional keys in other parts of the tuple, though that gets you back into the situation of why the idiom of having a |
Or record types, which is where we are headed with #4553 . In any case, joining on a closure is a nice generic solution whether you are using tuples, maps, or record types |
Spawned from the language improvements mega-discussion (#3107). Operators that combine or group channels based on an index (i.e.
by
option) currently only support integer indices. They should also support string indices so that they can be used with maps.To my knowledge, these operators include:
The text was updated successfully, but these errors were encountered: