-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Treeshop's ability to recognize R1/R2 naming conventions #18
Comments
Suggest we keep the default Makefile regex and then override it from fabfile. We've been down this rabbit hole many times and there is no one size fits all so the Makefile should work well with the common case (likely R1/R2) with fab trying to disambiguate. |
e-t-k
added a commit
that referenced
this issue
Oct 17, 2022
e-t-k
added a commit
that referenced
this issue
Nov 15, 2022
* add ercc thops#466 - reference section * add troubleshooting for 'Needed to prompt... * update treeshop.md with new pipelines * fix makefile expression message * improve R1/R2 detection in Makefile (#18) * ERCC - expression step (untested) add erccexpression step to Makefile (tested, works) and fabfile (untested). currently output files that are not ideal are: - kallisto file - rsem_genes.hugo.results (see issue) * ERCC - qc step (untested) Added qc step to Makefile (currently running) and fabfile (fully untested) * ercc fabfile bugfix its stringly typed in the process( signature! Convert it to an actual bool. hilarious. * ERCC - remaining steps (UNTESTED) added ERCC option for pizzly, fusion, jfkm, variants mostly just changes the output dir, a few of them that drop files in primary / derived need to change the bam names too totally untested, not even executed. * single whitespace typo * removed grep -v -- not working. so the previous version is broken because i forgot the pipe character but i tried putting it in - so the last line is | grep -v "DEBUG toil" and it's not sucessfully filtering the lines. I'm not sure if the pipe is running inside or outside the docker and im not sure whether docker is sending things to stdout or stderr or what. So for now I just totally remove it. (so no, there is not a committed version with the pipe in -- I tried running it without committing and it did run but didn't filter the lines. ) * bugfixes in fabfile.py for ERCC still in progress, not tested. > can't hardlink some bams because they are owned by root. but can move them because ubuntu owns the parent dir. so just move them to a name with ERCC in them, download, and move back instead. > fixed longstanding typo "Unable find any fastqs or bams... * makefile bugfixes for ERCCC untested - add the --logInfo flag to expression_ercc docker to hopefully get rid of debug output for real - fix qc_ercc - wasn't properly giving it the path to the reference file * ercc - bugfix removed a wayward do_ercc (should be ercc) that caused pizzly to crash and 1 more thing. mostly works. * Fix fusion potential hang (this change applies to both standard and ERCC-transcript runs) fix situation where fusion would hang indefinitely if it didn't generate proper output and instead left behind a _STARtmp folder with a named pipe inside it -- fab would try to download the pipe and it would never say it was done. With this version -- if it doesn't find any fusion output files at all, it will accept that and continue on with variants and jfkm before moving to the next sample. this is the version of the fabfile i am testing right now * tested on ERCC path but not non-ERCC path - ERCC - run expression and QC only - skip pizzly, fusions, jfkm, variants. (However the ERCC toggles are still within those steps if we change and want to run them.) Non-ERCC Change: If fusion fails, the pipeline will continue onward and make a note at the end * Create ercc.md ERCC-aware pipeline: add documentation (thops#466) * Update treeshop.md add notes about acceptable fastq names * Update README.md separated out make from git clone to hopefully clarify that its not mandatory * suppress toil debug output from expression
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Improve Treeshop's ability to recognize R1/R2 naming conventions
Background:
The Makefile currently contains two regex lines to recognize which primary files are R1 and which are R2:
However, lately, these regex don't work and we have to change them by hand in the Makefile to the following in order to recognize the naming convention of the files we've been getting lately:
Solution suggested by Ellen:
The fab file should use a more sophisticated detection mechanism than a regex and then send THAT to the makefile
The text was updated successfully, but these errors were encountered: