-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trimming R1 fast files in all aligners #333
Comments
I would have totally expected both alevin and star to only use the number of nucleotides from R1 that is specified in the chemistry definition. In this case you run with STARsolo scrnaseq/assets/protocols.json Lines 49 to 52 in c5a6445
simpleaf scrnaseq/assets/protocols.json Lines 11 to 13 in c5a6445
@rob-p, is there additional trimming needed before running simpleaf? |
@grst Thanks for taking the time to comment! I'm a nextflow novice but I tried to follow how adding One extra thing is that I did try this pipeline on the test data for every aligner and it works just fine. I manually checked the fastq files used in the test run and R1 there was already "pre-trimmed" to just the umis and barcodes. |
This should totally be supported. For me the question is if we can make alevin/starsolo directly work with the untrimmed reads (and what would be the appropriate command line options for that) or if we'd require an additional step that hard-trims the reads to the required length. |
I will try to look into this and post what I find out here. |
Description of feature
I'm following up on a slack post that I put out 2 months ago at https://nfcore.slack.com/archives/CHN5BV5DW/p1712178056321859
I had a question about the fastq format needed for the different aligners. For everything I’m using nextflow
v23.10.1
and scrnaseqv2.5.1
. From the core at my work place both theR1
andR2
fastq files each have a length of 151 for the reads, instead ofR1
being “trimmed” to just be only the barcode and umi (so like 28-ish bps depending on the protocol). When using--aligner cellranger
this seems to be handled fine. However, when only switching--aligne
r to eitheralevin
orstar
it doesn’t seem to handle thatR1
read format well. Foralevin
the pipeline completes but the number of barcodes inbarcodes.tsv
is ~200k, which is roughly the number of reads, whereas the expected number of cells is ~5k. Forstar
the pipeline fails atNFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN
with the errorEXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28
. My questions are, is this known behavior of the pipeline? I would like to usealevin
orstar
in the future, do I need preprocessR1
and if so, any help in doing that? Thanks!the run command I use looks like this, just only changing the aligner argument
The text was updated successfully, but these errors were encountered: