-
Notifications
You must be signed in to change notification settings - Fork 10
Prepare EMASE Pipeline ReadMe
MikeWLloyd edited this page Apr 11, 2024
·
7 revisions
• Step 1: Prepare EMASE reference
• Step 2: Prepare final transcript lists used in EMASE and also GBRS
flowchart TD
p0[EMASE_PREPARE_EMASE]
p1[BOWTIE_BUILD]
p2[CLEAN_TRANSCRIPT_LISTS]
o1([EMASE Input Files]):::output
o2([Multiway Bowtie Reference]):::output
o3([Transcript List for EMASE Input]):::output
p0 --> p1
p0 --> p2
p0 --> o1
p1 --> o2
p2 --> o3
classDef output fill:#90aaff,stroke:#6c8eff,stroke-width:2px,color:#000000
-
--pubdir
- Default:
/<PATH>
- Comment: The directory that the saved outputs will be stored.
- Default:
-
-w
- Default:
/<PATH>
- Comment: The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
- Default:
-
--genome_file_list
- Default:
/<PATH> OR /<PATH>,/<PATH/,...
- Comment: A comma separated list of FASTA genome file(s) for use hybrid genome construction (e.g., genome1.fa OR genome1.fa,genome2.fa,...). NOTE: FASTA AND GTF MUST BE IN THE SAME ORDER.
- Default:
-
--gtf_file_list
- Default:
/<PATH> OR /<PATH>,/<PATH/,...
- Comment: A comma separated list of GTF files corresponding to the genomes for use hybrid transcriptome construction (e.g., genome1.gtf OR genome1.gtf,genome2.gtf,...). NOTE: GTF AND FASTA MUST BE IN THE SAME ORDER.
- Default:
-
--haplotype_list
- Default:
<comma,delim,string>
- Comment: A list of haplotype names corresponding to genomes used in hybrid genome construction (e.g., 'A,B,C,D,E,F,G,H'). These names are appended to transcript IDs (e.g., ENMST00000042_A). NOTE: HAPLOTYPE LIST MUST BE IN THE SAME ORDER AS FASTA AND GTF FILES.
- Default:
NOTE: *
Represents a wild card that is a placeholder for values that will be filled by input file names and/or parameters when the pipeline is run.
Naming Convention | Description |
---|---|
prepare_emase_report.html |
Nextflow autogenerated report |
trace.txt |
Nextflow trace of processes |
*/emase/emase.pooled.transcripts.fa |
Pooled transcripts in fasta format for all transcripts and haplotypes |
*/emase/emase.gene2transcripts.tsv |
Gene to transcript ID mapping in tab delimited format |
*/emase/bowtie/*.ebwt |
Bowtie index files required for mapping |
*/emase/emase.fullTranscripts.info |
The complete list of transcripts included in the bowtie index, and other files |
*/emase/*.pooled.fullTranscripts.info |
The complete list of transcripts, and transcript lengths for all haplotypes. |
There are no optional outputs for this workflow.