Skip to content

Prepare EMASE Pipeline ReadMe

MikeWLloyd edited this page Apr 11, 2024 · 7 revisions

Prepare-EMASE Input Files Documentation

Prepare-EMASE Pipeline (--workflow prepare_emase)

•   Step 1: Prepare EMASE reference   
•   Step 2: Prepare final transcript lists used in EMASE and also GBRS  

Prepare-EMASE Flowchart

flowchart TD
    p0[EMASE_PREPARE_EMASE]
    p1[BOWTIE_BUILD]
    p2[CLEAN_TRANSCRIPT_LISTS]

    o1([EMASE Input Files]):::output
    o2([Multiway Bowtie Reference]):::output
    o3([Transcript List for EMASE Input]):::output

    p0 --> p1
    p0 --> p2
    p0 --> o1
    p1 --> o2
    p2 --> o3


classDef output fill:#90aaff,stroke:#6c8eff,stroke-width:2px,color:#000000
Loading

Parameters for Prepare-EMASE Pipeline

  • --pubdir

    • Default: /<PATH>
    • Comment: The directory that the saved outputs will be stored.
  • -w

    • Default: /<PATH>
    • Comment: The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
  • --genome_file_list

    • Default: /<PATH> OR /<PATH>,/<PATH/,...
    • Comment: A comma separated list of FASTA genome file(s) for use hybrid genome construction (e.g., genome1.fa OR genome1.fa,genome2.fa,...). NOTE: FASTA AND GTF MUST BE IN THE SAME ORDER.
  • --gtf_file_list

    • Default: /<PATH> OR /<PATH>,/<PATH/,...
    • Comment: A comma separated list of GTF files corresponding to the genomes for use hybrid transcriptome construction (e.g., genome1.gtf OR genome1.gtf,genome2.gtf,...). NOTE: GTF AND FASTA MUST BE IN THE SAME ORDER.
  • --haplotype_list

    • Default: <comma,delim,string>
    • Comment: A list of haplotype names corresponding to genomes used in hybrid genome construction (e.g., 'A,B,C,D,E,F,G,H'). These names are appended to transcript IDs (e.g., ENMST00000042_A). NOTE: HAPLOTYPE LIST MUST BE IN THE SAME ORDER AS FASTA AND GTF FILES.

Pipeline Default Outputs

NOTE: * Represents a wild card that is a placeholder for values that will be filled by input file names and/or parameters when the pipeline is run.

Naming Convention Description
prepare_emase_report.html Nextflow autogenerated report
trace.txt Nextflow trace of processes
*/emase/emase.pooled.transcripts.fa Pooled transcripts in fasta format for all transcripts and haplotypes
*/emase/emase.gene2transcripts.tsv Gene to transcript ID mapping in tab delimited format
*/emase/bowtie/*.ebwt Bowtie index files required for mapping
*/emase/emase.fullTranscripts.info The complete list of transcripts included in the bowtie index, and other files
*/emase/*.pooled.fullTranscripts.info The complete list of transcripts, and transcript lengths for all haplotypes.

Pipeline Options Outputs

There are no optional outputs for this workflow.

Clone this wiki locally