-
Notifications
You must be signed in to change notification settings - Fork 0
Metafusion Output
Results from clustering, ranking and filtering step of metafusion.
"gene5_renamed_symbol" = 5' gene_info assigned name(s)
"gene3_renamed_symbol" = 3' gene_info assigned name(s)
"gene5_chr" = 5' chromosome
"gene5_breakpoint" = 5' breakpoint(s), pipe separated if more than 1
"gene3_chr" = 3' chromosome
"gene3_breakpoint" = 3' breakpoint(s), pipe separated if more than 1
"max_split_cnt" = Maximum split read count across all fusion calls for cluster
"max_span_cnt" = Maximum span read count across all fusion calls for cluster
"T_N" = tumor or normal
"disease" = disease
"tool" = tool(s) calling fusions in cluster
"FusionType" = Reannotated Fusion Category
"sample" = Sample name
"cancer_db_hits" = OPTIONAL, currently outputting NA,.
"FID" = All FIDs belonging to this cluster.
Column Description:
"gene5_chr" = 5' gene chromosome
"gene5_breakpoint" = 5' gene breakpoint
"gene5_strand" = 5' gene strand
"gene3_chr" = 3' gene chromosome
"gene3_breakpoint" = 3' gene breakpoint
"gene3_strand" = 3' gene strand
"library" = RNA, DNA or NA
"sample" = Sample name
"T_N" = Tumor or normal
"disease" = disease information. Currently being set to NA
"tool" = tool name
"max_split_cnt" = # split reads
"max_span_cnt" = # spanning reads
"gene5_renamed_symbol" = 5' gene info renamed symbol
"gene5_tool_annotation" = tool's original annotation for 5' gene
"gene3_renamed_symbol" = 3' gene info renamed symbol
"gene3_tool_annotation" = tool's original annotation for 3' gene
"FusionType" = category according to metafusion
"reann_gene5_symbol" = 5' reannotated gene name according to gene_bed file (cds/intron/UTR)
"reann_gene5_region"= 5' reannotated gene region according to gene_bed file
"reann_gene3_symbol"= 3' reannotated gene name according to gene_bed file (cds/intron/UTR)
"reann_gene3_region"= 3' reannotated gene region according to gene_bed file
"reann_gene5_on_bndry" = 5' reannoated gene on boundary according to gene_bed file
"reann_gene5_close_to_bndry" = 5' reannoated gene close to boundary according to gene_bed file
"reann_gene3_on_bndry" = 3' reannoated gene on boundary according to gene_bed file
"reann_gene3_close_to_bndry = 3' reannoated gene close to boundary according to gene_bed file
"score" = Fusion score according to reannotation information
"coding_id_distance" = # of coding genes between 5' and 3' genes if genes are on same chromosome, or within 1 chromsome, -1 if not
"gene_interval_distance" = # bp between breakpoints if genes are on same chromosome.
"dnasupp" = Unsure meaning. Always -9
"FID" = Fusion ID Number
"gene5_seq" = Sequence according to FASTA +100 bp in front of 5' gene
"gene3_seq" = Sequence according to FASTA -100 bp in behind of 3' gene
"is_inframe" = Always NA
"closest_exon5" = gene_bed exon closest to 5' reann gene
"closest_exon3" = gene_bed exon closest to 3' reann gene
"captured_reads" = -1 NOT SURE MEANING
"gene5_transcript_id" = selected 5' reann_transcript
"gene3_transcript_id" = selected encountered 3' reann_transcript
"is_clinical5" = Refseq id matching gene5_transcript_id according to MANE v1.2, NA if gene5_transcript_id is not in MANE
"is_clinical3" = Refseq id matching gene3_transcript_id according to MANE v1.2, NA if gene3_transcript_id is not in MANE
"Metafusion_flag" = flag if there may have been a problem reannotating a fusion
"cluster" = cluster number assigned by Metafusion
Any adjacent noncoding, samegene and readthrough events that may have biological significance. Same formatting as Final.n#.cluster, except additional first column called "TEST".
CodingFusion: two genes which have a coding region in the gene_bed file. DOES NOT MEAN IT WILL PRODUCE PROTEIN ReadThrough: Two genes next two each other on a strand TruncatedCoding: A 5' coding region and a 3' non coding/intergenic region according to reannotation gene_bed information TruncatedNoncoding: A 5' noncoding region and a 3' intergenic/coding/noncoding region. NoHeadGene: A 5' intergenic region and a 3' coding/noncoding/intergenic region SameGene: 5' and 3' gene names are the same
This will happen when a different Ensembl GTF file has a gene id which does not have a matching gene name in the primary GTF version. During the renaming step, the gene will be renamed to the callers original gene name, even if that gene name doesn't exist in our primary GTF. The reann_gene will be a different gene name that maps to the same chr:breakpoint in the primary GTF. Results generated off of the calculate_score step will therefore be related to the reann_gene, not the original callers gene information. Additionally, if a breakpoint is not within the primary GTF file's chromosome, reann_gene will automatically be assigned as NA and will have the lowest score possible.
FusionCatcher calls IGH_locus_(b):chr14:106329468. However IGH_locus_(b) does not exist in v75 GTF. When Metafusion reannotates this breakpoint, it finds IGHJ6. Therefore, all annotations and scores are generated off of the IGHJ6:chr14:106329468 location in v75.
Starfusion uses gencode, which manually curates some regions of the chromosomes. It calls MALT1:chr18:56337141. However, within v75 GTF this region DOES NOT EXIST. All annotations are set to NA. Lowest score assigned.
If reformatting step did not sufficiently remove all chromosomes which do not exist in the gene bed, the reannotation step will break and subsequent fusions will not be included in further analysis.. Only fusions which had been processed prior to encountering the fusion with the unusual chromosome will be analyzed.
Fusioncatcher has a patch chrUn_gl000228 which does not exist within v75. If a chromosome doesnt exist in the primary GTF, metafusion will STOP parsing any more lines in the CFF and break. These need to be removed prior to entering metafusion, which is the purpose of the reformatting step.
Possible explanations for this flag:
- The renamed gene name does NOT exist within the gene_bed file, reann_gene is different for one or both fo the genes.
- The breakpoint specified by the tool does not have the same gene locations. A different gene name exists at the breakpoint in the gene bed file.
- When scoring annotated gene fusion, fusion order was flipped due to higher scoring gene pair being identified. Theoretically should not happen with FORTE callers, but need to be aware that this is an option.
Starfusion calls DUX4:chr4:190996354. At the chr4:190996354 location in v75 is DUX4L6 and metafusion assigned the reann_gene_name to DUX4L6, setting all annotations off of this call. HOWEVER, clustering will be off of DUX4.