This directory contains various analysis modules in the OpenPBTA project. See the README of an individual analysis modules for more information about that module.
The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that can be consumed by other analysis modules.
This is in service of documenting interdependent analyses.
Note that nearly all modules use the harmonized clinical data file (pbta-histologies.tsv
) even when it is not explicitly included in the table below.
Module | Input Files | Brief Description | Output Files Consumed by Other Analyses |
---|---|---|---|
chromosomal-instability |
pbta-histologies.tsv pbta-sv-manta.tsv.gz pbta-cnv-cnvkit.seg.gz |
Evaluates chromosomal instability by calculating chromosomal breakpoint densities and by creating circular plot visuals | N/A |
cnv-chrom-plot |
pbta-cnv-consensus-gistic.zip analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg |
Plots genome wide visualizations relating to copy number results | N/A |
cnv-comparison |
Earlier version of SEG files | Deprecated; compared earlier version of the CNV methods. | N/A |
collapse-rnaseq |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds gencode.v27.primary_assembly.annotation.gtf.gz |
Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. | results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds (included in data download; too large for tracking via GitHub) results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds (included in data download; too large for tracking via GitHub) |
comparative-RNASeq-analysis |
pbta-gene-expression-rsem-tpm.polya.rds pbta-gene-expression-rsem-tpm.stranded.rds pbta-histologies.tsv pbta-mend-qc-manifest.tsv pbta-mend-qc-results.tar.gz |
In progress; will produce expression outlier profiles per #229 | N/A |
compare-gistic |
analyses/run-gistic/results/pbta-cnv-consensus-gistic.zip analyses/run-gistic/results/pbta-cnv-consensus-hgat-gistic.zip analyses/run-gistic/results/pbta-cnv-consensus-lgat-gistic.zip analyses/run-gistic/results/pbta-cnv-consensus-medulloblastoma-gistic.zip |
Comparison of the GISTIC results of the entire cohort with the GISTIC results of three individual histolgies, namely, LGAT, HGAT and medulloblastoma (#547 | N/A |
copy_number_consensus_call |
pbta-cnv-cnvkit.seg.gz pbta-cnv-controlfreec.tsv.gz pbta-sv-manta.tsv.gz |
Produces consensus copy number calls per #128 and a set of excluded regions where CNV calls are not made | results/cnv_consensus.tsv results/pbta-cnv-consensus.seg.gz (included in data download) ref/cnv_excluded_regions.bed ref/cnv_callable.bed |
create-subset-files |
All files | This module contains the code to create the subset files used in continuous integration | All subset files for continuous integration |
focal-cn-file-preparation |
pbta-cnv-cnvkit.seg.gz pbta-cnv-controlfreec.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.gz |
Maps from copy number variant caller segments to gene identifiers; will be updated to take into account changes that affect entire cytobands, chromosome arms (#186) | results/cnvkit_annotated_cn_autosomes.tsv.gz results/cnvkit_annotated_cn_x_and_y.tsv.gz results/controlfreec_annotated_cn_autosomes.tsv.gz results/controlfreec_annotated_cn_x_and_y.tsv.gz results/consensus_seg_annotated_cn_autosomes.tsv.gz (included in data download) results/consensus_seg_annotated_cn_x_and_y.tsv.gz (included in data download) |
fusion_filtering |
pbta-fusion-arriba.tsv.gz pbta-fusion-starfusion.tsv.gz |
Standardizes, filters, and prioritizes fusion calls | results/pbta-fusion-putative-oncogenic.tsv (included in data download) results/pbta-fusion-recurrent-fusion-byhistology.tsv (included in data download) results/pbta-fusion-recurrent-fusion-bysample.tsv (included in data download) |
fusion-summary |
pbta-histologies.tsv pbta-fusion-putative-oncogenic.tsv pbta-fusion-arriba.tsv.gz pbta-fusion-starfusion.tsv.gz |
Generate summary tables from fusion files (#398; #623) | results/fusion_summary_embryonal_foi.tsv (included in data download) results/fusion_summary_ependymoma_foi.tsv (included in data download) results/fusion_summary_ewings_foi.tsv |
gene-set-enrichment-analysis |
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
In progress. Updated gene set enrichment analysis with appropriate RNA-seq expression data | results/gsva_scores_stranded.tsv results/gsva_scores_polya.tsv for stranded, polya expression data respectively |
immune-deconv |
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
Immune/Stroma characterization across PBTA (part of #15) | results/deconv-output.RData |
independent-samples |
pbta-histologies.tsv |
Generates independent specimen lists for WGS/WXS samples | results/independent-specimens.wgs.primary.tsv (included in data download) results/independent-specimens.wgs.primary-plus.tsv (included in data download) results/independent-specimens.wgswxs.primary.tsv (included in data download) results/independent-specimens.wgswxs.primary-plus.tsv (included in data download) |
interaction-plots |
independent-specimens.wgs.primary-plus.tsv pbta-snv-consensus-mutation.maf.tsv.gz |
Creates interaction plots for mutation mutual exclusivity/co-occurrence #13; may be updated to include other data types (e.g., fusions) | N/A |
molecular-subtyping-ATRT |
analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz pbta-snv-consensus-mutation-tmb-all.tsv pbta-cnv-consensus-gistic.zip |
Summarizing data into tabular format in order to molecularly subtype ATRT samples #244; this analysis did not work | N/A |
molecular-subtyping-EPN |
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-cnv-consensus-gistic.zip analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv analyses/fusion-summary/results/fusion_summary_ependymoma_foi.tsv analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv |
In progress; molecular subtyping of ependymoma tumors | N/A |
molecular-subtyping-EWS |
analyses/fusion-summary/results/fusion_summary_ewings_foi.tsv |
Reclassification of tumors based on the presence of defining fusions for Ewing Sarcoma per #623 | results/EWS_samples.tsv |
molecular-subtyping-HGG |
pbta-snv-consensus-mutation.maf.tsv.gz analyses/focal-cn-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz pbta-fusion-putative-oncogenic.tsv pbta-cnv-consensus-gistic.zip pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Molecular subtyping of high-grade glioma samples #249 | results/HGG_molecular_subtype.tsv |
molecular-subtyping-LGAT |
pbta-snv-consensus-mutation.maf.tsv.gz pbta-fusion-putative-oncogenic.tsv pbta-fusion-recurrently-fused-genes-bysample.tsv |
Molecular subtyping of Low-grade astrocytic tumor samples #631 | results/lgat_subtyping.tsv |
molecular-subtyping-SHH-tp53 |
pbta-histologies pbta-snv-consensus-mutation.maf.tsv.gz |
Deprecated; Identify the SHH-classified medulloblastoma samples that have TP53 mutations #247 | N/A |
molecular-subtyping-chordoma |
analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
In progress; identifying poorly-differentiated chordoma samples per #250 | N/A |
molecular-subtyping-embryonal |
analyses/fusion-summary/fusion_summary_embryonal_foi.tsv pbta-histologies.tsv pbta-sv-manta.tsv.gz analyses/focal-cn-file-preparation/consensus_seg_annotated_cn_x_and_y.tsv.gz analyses/focal-cn-file-preparation/cnvkit_annotated_cn_x_and_y.tsv.gz analyses/focal-cn-file-preparation/controlfreec_annotated_cn_x_and_y.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors #251 | results/embryonal_tumor_molecular_subtypes.tsv |
molecular-subtyping-pathology |
analyses/molecular-subtyping-EWS/results/EWS_samples.tsv analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv pbta-fusion-putative-oncogenic.tsv |
Compile output from other molecular subtyping modules and incorporate pathology feedback #645 | results/compiled_molecular_subtyping_with_pathology_feedback.tsv |
mutational-signatures |
pbta-snv-consensus-mutation.maf.tsv.gz |
Performs COSMIC and Alexandrov et al. mutational signature analysis using the consensus SNV data | N/A |
mutect2-vs-strelka2 |
pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz |
Deprecated; comparison of only two SNV callers, subsumed by snv-callers |
N/A |
oncoprint-landscape |
pbta-snv-consensus-mutation.maf.tsv.gz pbta-fusion-putative-oncogenic.tsv analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz independent-specimens.* |
Combines mutation, copy number, and fusion data into an OncoPrint plot (#6); will need to be updated as all data types are refined | N/A |
rna-seq-composition |
pbta-gene-expression-rsem-tpm.stranded.rds pbta-histologies.tsv pbta-mend-qc-results.tar.gz pbta-mend-qc-manifest.tsv pbta-star-log-manifest.tsv pbta-star-log-final.tar.gz |
Analyzes the fraction of read types that comprise each RNA-Seq sample; flags samples with unusual composition | N/A |
run-gistic |
pbta-histologies.tsv pbta-cnv-consensus.seg.gz |
Runs GISTIC 2.0 on SEG files | pbta-cnv-consensus-gistic.zip (included in data download) |
sample-distribution-analysis |
pbta-histologies.tsv |
Produces plots and tables that illustrate the distribution of different histologies in the PBTA data | N/A |
selection-strategy-comparison |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds |
Deprecated; Comparison of RNA-seq data from different selection strategies | N/A |
sex-prediction-from-RNASeq |
pbta-gene-expression-kallisto.stranded.rds pbta-histologies.tsv |
In progress; predicts genetic sex using RNA-seq data (#84) | N/A |
snv-callers |
pbta-snv-lancet.vep.maf.gz pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz pbta-snv-vardict.vep.maf.gz tcga-snv-lancet.vep.maf.gz tcga-snv-mutect2.vep.maf.gz tcga-snv-strelka2.vep.maf.gz |
Generates consensus SNV and indel calls for PBTA and TCGA data; calculates tumor mutation burden using the consensus calls | results/consensus/pbta-snv-consensus-mutation.maf.tsv.gz (included in data download; too large for tracking via GitHub) results/consensus/pbta-snv-consensus-mutation-tmb-all.tsv results/consensus/pbta-snv-consensus-mutation-tmb-coding.tsv (included in data download; too large for tracking via GitHub) results/consensus/tcga-snv-consensus-mutation.maf.tsv.gz results/consensus/tcga-snv-mutation-tmb.tsv results/consensus/tcga-snv-mutation-tmb-coding.tsv |
ssgsea-hallmark |
pbta-gene-counts-rsem-expected_count.stranded.rds |
Deprecated; performs GSVA using Hallmark gene sets | N/A |
survival-analysis |
TBD | In progress; will eventually contain functions for various types of survival analysis (#18) | N/A |
sv-analysis |
pbta-sv-manta.tsv.gz independent-specimens.wgs.primary-plus.tsv |
In progress; chromothripsis analysis per #27 | N/A |
telomerase-activity-prediction |
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-counts-rsem-expected_count.stranded.rds pbta-gene-counts-rsem-expected_count.polya.rds |
Quantify telomerase activity across pediatric brain tumors (part of #148) | results/TelomeraseScores_PTBAPolya_counts results/TelomeraseScores_PTBAPolya_FPKM.txt results/TelomeraseScores_PTBAStranded_counts.txt results/TelomeraseScores_PTBAStranded_FPKM.txt |
tmb-compare-tcga |
pbta-snv-consensus-mutation-tmb-coding.tsv |
Compares PBTA tumor mutation burden to adult TCGA data; will be updated per #257 and #556 | N/A |
tp53_nf1_score |
pbta-snv-consensus-mutation.maf.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Applies TP53 inactivation, NF1 inactivation, and Ras activation classifiers to RNA-seq data #165 | N/A |
transcriptomic-dimension-reduction |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds pbta-gene-expression-kallisto.polya.rds pbta-gene-expression-kallisto.stranded.rds |
Dimension reduction and visualization of RNA-seq data (part of #9) | N/A |
tcga-capture-kit-investigation |
pbta-snv-lancet.vep.maf.gz pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz tcga-snv-lancet.vep.maf.gz tcga-snv-mutect2.vep.maf.gz tcga-snv-strelka2.vep.maf.gz pbta-histologies.tsv pbta-tcga-manifest.tsv WGS.hg38.lancet.unpadded.bed WGS.hg38.strelka2.unpadded.bed WGS.hg38.mutect2.vardict.unpadded.bed |
Investigation of the TMB discrepancy between PBTA and TCGA data | results/*.bed |