Integrate filtering steps into map-reduce #135

tkchafin · 2024-10-01T10:01:33Z

Description of feature

Currently we generate byte ranges referencing the cram file, and stream these through the various steps (FASTQ conversion etc) involved in alignment, to remove unnecessary I/O of intermediate files. Currently this excludes a filtering steps that are currently (e.g., BLASTN for PacBio) or planned for the pipeline.

To add in further steps that would also benefit from chunked processing, we need either:

Special version of the cram_filter_<aligner>_*.nf modules for each specific step
Write the chunks to file and pass these through all required modules
Separate map-reduce for filtering steps

Option 1 could mean a lot of different version of the align modules, but saves on intermediate files. Option 2 would allow us to use a lot more 'stock' modules, which would be easier to maintain, but creates more intermediates. Option 3 is probably not a great option.

The text was updated successfully, but these errors were encountered:

tkchafin added the enhancement Improvement of the existing features label Oct 1, 2024

reichan1998 mentioned this issue Oct 22, 2024

Integrate filtering steps into map-reduce #136

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate filtering steps into map-reduce #135

Integrate filtering steps into map-reduce #135

tkchafin commented Oct 1, 2024

Integrate filtering steps into map-reduce #135

Integrate filtering steps into map-reduce #135

Comments

tkchafin commented Oct 1, 2024

Description of feature