Skip to content

Custom (checkbox selected) data

Mikhail Dozmorov edited this page Jul 26, 2016 · 4 revisions

Besides volumes of regulatory/epigenomic data, there are many other datasets that may be useful to test specific biological questions. Such datasets may not fit within the systematic schema of GenomeRunner, and are handled separately (see Makefile)

We put several databases accessible through the checkboxes on the front page. Their description, and potential use, is shown in the table.

Genome annotation category Description Experimental question: Are the SNPs of interest...
coriellVariants Coriell Cell Line Copy Number Variants, split by cell types ... enriched in CNVs, and in which cell type?
CpG-xxx Model-based CpG islands … enriched in CpG sites?
dgvVariants Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del), split by variant type ... enriched in CNVs, or other types of structural variations?
GERP-xxx Genomic Evolutionary Rate Profiling elements ... enriched in evolutionary constrained regions?
gwasCatalog NHGRI Catalog of Published Genome-Wide Association Studies, split by disease/trait types ... enriched in known disease-specific SNPs?
knownAlt Alternative Splicing, Alternative Promoter and Similar Events in UCSC Genes, split by splicing type ... potentially disrupt a specific type of alternative spliced regions?
ncRNAs C/D and H/ACA Box snoRNAs, scaRNAs, and microRNAs from snoRNABase and miRBase, split by ncRNA type ... associated with a class of non-coding elements?
nestedRepeats Repeating Elements by RepeatMasker, split by repeat class ... enriched in regions of low complexity, and in which class? Classes include: 'Short interspersed nuclear elements (SINE), which include ALUs', 'Long interspersed nuclear elements (LINE)', 'Long terminal repeat elements (LTR), which include retroposons', 'DNA repeat elements (DNA)', 'Simple repeats (micro-satellites)', 'Low complexity repeats', 'Satellite repeats', 'RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)', 'Other repeats, which includes class RC (Rolling Circle)', 'Unknown'
super_enhancers Super-enhancers in the control of cell identity and disease. Article ... enriched in cell type-specific super-enhancer regions?
tfbsEncode Transcription Factor ChIP-seq Clusters V3 (161 targets, 189 antibodies) from ENCODE, split by TFBS name ... potentially disrupt a specific experimentally defined transcription factor binding site?
tfbsConserved HMR Conserved Transcription Factor Binding Sites, split by TFBS name ... potentially disrupt a specific computationally defined transcription factor binding site?
UCNEs UltraConserved Noncoding Elements from the UCNE base ... potentially disrupt ultra conserved elements
VMRs Variably Methylated Regions (VMRs) and CpG sites (CpGs) Article ... potentially alter regions/CpGs variably methylated across normal tissues, hence, affecting cell/tissue identity?

Please, request any other genome annotation data you'd like to have