Channel topics [EXPERIMENTAL] #2842
Replies: 7 comments 17 replies
-
Awesome!! How about
If one or more of the channels in a "topic" are empty would the process still be triggered? |
Beta Was this translation helpful? Give feedback.
-
This proposed feature is very welcome. I don't know how many times I had to mimics this behavior by mixing channels. As much as I understand that I, for one, would vote for Anyway, great job!! |
Beta Was this translation helpful? Give feedback.
-
Happy to see your comments! Let's try to focus on the feature itself more than the naming. Do you think this could be really used to replace the mass inputs declaration required for multiqc into a single topic channel? Re the I've uploaded a snapshot version so that you can try it by yourself
|
Beta Was this translation helpful? Give feedback.
-
I very much like the idea of topic channels and I think they will be very valuable. I don't think the currently proposed implementation is necessarily the best one because it forces me to do pipeline design at the module level. You gave the example of nf-core. Nf-core has hundreds of modules that are supposed to be plug and play into any pipeline (not only nf-core ones). Of course, one can design by convention and say certain topics are expected but it makes more sense to me, to make the topic configurable. Since the topic is a string, maybe you already intend for this to be possible. So it could be something like output:
path '*.txt', topic: "${params.foo_topic}" Another idea could be to set this at the workflow level, maybe something like: A.out.topic('foo') |
Beta Was this translation helpful? Give feedback.
-
Clashing input file pathsOne potential issue with topics may be that it'll be difficult to keep input file paths unique. This is one of the reasons that nf-core/rnaseq uses a local copy of the MultiQC module: it gives directory prefixes to each input channel to avoid clashes. An idea to generalise this would be to use wildcards in the staging-in directory path. So instead of: path ('fastqc/*')
path ('trimgalore/fastqc/*')
path ('trimgalore/*')
path ('sortmerna/*')
//... Could use (straight from the docs): path ('dir??/*') This would give staged file names:
@drpatelh I wonder if we could investigate using this right away, as it would be a general improvement for us aside from the whole idea of topics. |
Beta Was this translation helpful? Give feedback.
-
I agree that channel topics seems to break the separation of concerns between module design and workflow design. Given that the nf-core multiqc module seems to be the primary motivating case, I'm looking at this module and wondering, why not just have one input channel that is a mix of all the multiqc inputs? Isn't multiqc just creating a report out of whatever inputs are present? I presume that it is useful to explicitly enumerate all of the directory names that multiqc recognizes, but I can't tell if that's actually important to what multiqc is doing. Even if it is, you could also have one input channel and filter it based on an allowlist of directory names. Overall, it seems to me that a channel topic is a sort of implicit and dynamic |
Beta Was this translation helpful? Give feedback.
-
Revisiting this discussion... I was initially focused on topics as a solution to multiqc's many optional arguments. But now I see it was more about multiqc collecting the tool versions from all upstream processes. After thinking through the implementation details of a
While (1) works and already removes a lot of nf-core boilerplate from processes, it doesn't remove all the channel logic required to collect the tool versions for MULTIQC. On the other hand, I'm not sure how a process would emit this trace metadata aside from a custom output property... I guess you could declare an output channel like so: output:
path '.command.trace', emit: trace, topic: 'trace' |
Beta Was this translation helpful? Give feedback.
-
Nextflow channels allow connecting one or more processes producing some output data with one or more processes consuming such data. For example:
In the snippet above the processes
B
andC
consume the data produced by processA
. Instead:In the example above the process,
C
consumes the output of processesA
andB
.This approach covers the most common use cases, however, it requires declaring beforehand the expected input/output of each process. For example, if process
C
needs to collect the inputs from three processes, it would be required to change the corresponding input declaration.An alternative could be to declare just one input and then mix together the input channels into a single one, provided they contain homogenous data. For example:
In this case, the
C
process can receive an arbitrary number of input channels, however, the relationship still needs to be known beforehand and the resulting channel can be verbose and error-prone to be written.This is even more clear in this recurrent pattern in nf-core pipelines.
Topics for the rescue
A solution to this problem could be the introduction of topic channel. A topic channel is a shared channel identified by a name to which multiple processes can write some output.
For example:
The above example shows two processes,
A
andB
declaring two outputs with the samefoo
topic.The topic channel holds the output of all processes having the same topic name and it can be accessed via the new
channel.topic(NAME)
factory method.For example:
If necessary the output of the processes can also be accessed independently using the usual notation e.g.
A.out
andB.out
.Conclusion
Topic channels add to Nextflow the ability to implement pub/sub message model in which multiple publishers send a message over a shared channel that is received by multiple subscribers.
This model further decouples the workflow composition adding the ability to connect each other tasks only known at runtime.
Beta Was this translation helpful? Give feedback.
All reactions