Implement cram chunking and minimap2-based Hi-C alignments #113

reichan1998 · 2024-09-10T11:51:05Z

Adapted hic_bwamem2 and hic_minimap2 from sanger-tol/treeval to chunk HiC CRAM file into containers and align based on read groups.
Closes Implement minimap2-based Hi-C alignments #97

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
If you've added a new tool - have you followed the pipeline conventions in the contribution docs
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Usage Documentation in docs/usage.md is updated.
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

…ta and chunking

Cram handling

muffato · 2024-09-10T20:18:19Z

filter_five_end.pl was actually written by the company Arima Genomics. It hasn't been done in TreeVal, but the copyright of this file needs to correctly stated.
Could you please change the copyright statement from being on priyanka (which is false) to:

Copyright (c) 2022-2024 Genome Research Ltd.
except `bin/filter_five_end.pl`:
Copyright (c) 2017 Arima Genomics, Inc.

@tkchafin : if you want to test the feature, try Myxine_glutinosa from TOLSD-2062. We just finished running it with the version 1.2.1 of the pipeline. The resulting CRAM file is 178 GB 🙀 and contains 6.5 billion reads. bwa-mem2 took more than 7 days to run, as a single process on 64 cores and 174 GB RAM.

Cram handling

tkchafin · 2024-09-16T11:06:49Z

Sent PR reichan1998#2 covering the Illumina case. I think interleaved fastq should be handled correctly there but would be worth a double-check! :)

Cram handling illumina

tkchafin

We also need to update the copyright and make sure the documentation is clear on interleaved fastq inputs but will send as a separate PR.
Also need to update CI to run both bwamem and minimap2 tests

reichan1998 and others added 7 commits August 30, 2024 20:46

Adapting hic_mapping.nf from TreeVal to implement Minimap2 for HiC da…

9734e66

…ta and chunking

Merge pull request sanger-tol#5 from reichan1998/cram_handling

504d8b1

Cram handling

generate csv seems to be working now?

168566f

Merge pull request #1 from tkchafin/cram_handling

e691ad4

Cram handling

adapt generate_cram_csv.sh to handle crai file

d7e2761

ensure the sam cram output for HiC reads

7c35dad

include hic_aligner in nextflow_schema.json

842fb5d

tkchafin self-requested a review September 10, 2024 12:37

fix linting

3fd90a6

tkchafin added 10 commits September 11, 2024 08:29

Merge pull request sanger-tol#7 from reichan1998/cram_handling

477ba0d

Cram handling

schema update; hic_ to short_aligner and added chunksize

6969303

made chunk_size an argument and fix off-by-1 error

6ee5f51

change hic_ to _mapreduce subworkflows

513229d

change hic_ to _mapreduce subworkflows

11c3c31

now passes params.chunk_size

fe7dd7a

tag with chunk_id so process tracking is accurate

b26d340

samtools/addreplacerg module

fe4d660

minor changes... currently working

17bbbc7

generalise mapreduce to illumina

a7db737

This was linked to issues Sep 16, 2024

Bwa-mem2 mem parallelisation #35

Closed

Implement chunking to speed up alignment #74

Closed

Implement minimap2-based Hi-C alignments #97

Closed

bwa-mem configuration for HiC #118

Closed

Merge pull request #2 from tkchafin/cram_handling_illumina

c93c6bc

Cram handling illumina

tkchafin approved these changes Sep 17, 2024

View reviewed changes

tkchafin merged commit 51a1d9f into sanger-tol:dev Sep 17, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement cram chunking and minimap2-based Hi-C alignments #113

Implement cram chunking and minimap2-based Hi-C alignments #113

reichan1998 commented Sep 10, 2024

muffato commented Sep 10, 2024

tkchafin commented Sep 16, 2024

tkchafin left a comment •

edited

Loading

Implement cram chunking and minimap2-based Hi-C alignments #113

Implement cram chunking and minimap2-based Hi-C alignments #113

Conversation

reichan1998 commented Sep 10, 2024

PR checklist

muffato commented Sep 10, 2024

tkchafin commented Sep 16, 2024

tkchafin left a comment • edited Loading

Choose a reason for hiding this comment

tkchafin left a comment •

edited

Loading