Trimming R1 fast files in all aligners #333

ChristopherMancuso · 2024-05-24T14:50:42Z

Description of feature

I'm following up on a slack post that I put out 2 months ago at https://nfcore.slack.com/archives/CHN5BV5DW/p1712178056321859

I had a question about the fastq format needed for the different aligners. For everything I’m using nextflow v23.10.1 and scrnaseq v2.5.1. From the core at my work place both the R1 and R2 fastq files each have a length of 151 for the reads, instead of R1 being “trimmed” to just be only the barcode and umi (so like 28-ish bps depending on the protocol). When using --aligner cellranger this seems to be handled fine. However, when only switching --aligner to either alevin or star it doesn’t seem to handle that R1 read format well. For alevin the pipeline completes but the number of barcodes in barcodes.tsv is ~200k, which is roughly the number of reads, whereas the expected number of cells is ~5k. For star the pipeline fails at NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN with the error EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28. My questions are, is this known behavior of the pipeline? I would like to use alevin or star in the future, do I need preprocess R1 and if so, any help in doing that? Thanks!

the run command I use looks like this, just only changing the aligner argument

nextflow run nf-core/scrnaseq --input samplesheet.csv --fasta /biostats_share/mancchri/genomes/Homo_sapiens.GRCh38.dna.primary_assembly.fa --gtf /biostats_share/mancchri/genomes/Homo_sapiens.GRCh38.111.gtf --aligner cellranger --protocol 10XV3 --outdir . -work-dir ./work -c config.conf -profile singularity -r 2.5.1

The text was updated successfully, but these errors were encountered:

grst · 2024-05-27T08:08:40Z

I would have totally expected both alevin and star to only use the number of nucleotides from R1 that is specified in the chemistry definition. In this case you run with --protocol 10XV3, which should lead to the following arguments passed to the tools:

STARsolo

scrnaseq/assets/protocols.json

Lines 49 to 52 in c5a6445

    
           "10XV3": { 
        
               "protocol": "CB_UMI_Simple", 
        
               "extra_args": "--soloUMIlen 12", 
        
               "whitelist": "assets/whitelist/10x_V3_barcode_whitelist.txt.gz"

simpleaf

scrnaseq/assets/protocols.json

Lines 11 to 13 in c5a6445

    
           "10XV3": { 
        
               "protocol": "10xv3", 
        
               "whitelist": "assets/whitelist/10x_V3_barcode_whitelist.txt.gz"

@rob-p, is there additional trimming needed before running simpleaf?

ChristopherMancuso · 2024-05-28T14:21:01Z

@grst Thanks for taking the time to comment! I'm a nextflow novice but I tried to follow how adding --protocol 10XV3 to the main run command changed the aligner options in the assets folder too and it looked like it should work, but didn't for the reasons above. I did enough googling and I think if I downloaded alevin or star as standalone software I could get it to work, but to be honest I like nextflow so much I'll just take the speed hit and use nextflow with cell ranger if this problem persists.

One extra thing is that I did try this pipeline on the test data for every aligner and it works just fine. I manually checked the fastq files used in the test run and R1 there was already "pre-trimmed" to just the umis and barcodes.

grst · 2024-06-04T11:56:45Z

This should totally be supported. For me the question is if we can make alevin/starsolo directly work with the untrimmed reads (and what would be the appropriate command line options for that) or if we'd require an additional step that hard-trims the reads to the required length.

ChristopherMancuso · 2024-06-06T16:38:34Z

I will try to look into this and post what I find out here.

ChristopherMancuso added the enhancement New feature or request label May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trimming R1 fast files in all aligners #333

Trimming R1 fast files in all aligners #333

ChristopherMancuso commented May 24, 2024

grst commented May 27, 2024

ChristopherMancuso commented May 28, 2024

grst commented Jun 4, 2024

ChristopherMancuso commented Jun 6, 2024

Trimming R1 fast files in all aligners #333

Trimming R1 fast files in all aligners #333

Comments

ChristopherMancuso commented May 24, 2024

Description of feature

grst commented May 27, 2024

ChristopherMancuso commented May 28, 2024

grst commented Jun 4, 2024

ChristopherMancuso commented Jun 6, 2024