This is a collection of bioinformatics tools I have sourced from recent literature, organized by topic. I have not used most of these tools.
Table of Contents
- Discovery
- Data Sets
- Genomics
- General Programming Resources
- Statistics/Machine Learning
- Visualization
- Publication/Archiving
- Promising methods without software implementation
- BlueButton related tools https://github.com/amida-tech
When looking for a bioinformatics tool for a specific application:
- http://omictools.com/
- http://www.gitxiv.com/?cat%5B0%5D=bioinformatics
- https://bio.tools/
- https://biosharing.org
- 1000 Genomes (RNA-Seq, ChIP-Seq): http://archive.gersteinlab.org/docs/2015/06.04/1kg_fun_studies.htm
- Reference panel of ~250 Dutch families http://biorxiv.org/content/early/2016/01/18/036897
- FANTOM consortium has CAGE (5' single molecule RNA counting) data from ~1000 human cell/tissue/cell-line samples from ~300 different cell/tissue types
- Full text of all PMC papers from 2008-present: ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/manuscript/
- Phased genome sequences
-
100 fully-phased: http://gigascience.biomedcentral.com/articles/10.1186/s13742-016-0148-z
- Statistically phased: http://www.haplotype-reference-consortium.org/
- Phased variants: http://genome.cshlp.org/content/early/2016/11/25/gr.210500.116.full.pdf+html
-
- Clinical trials: http://vivli.org/
- Targets for drug discovery: https://www.targetvalidation.org/
- Large medical datasets for ML: https://github.com/beamandrew/medical-data
- Species images: http://phylopic.org/image/browse/
- Public RNA-Seq data:
- https://jhubiostatistics.shinyapps.io/recount/
- with phenotype predictions https://bioconductor.org/packages/release/bioc/html/recount.html
- human and mouse: http://amp.pharm.mssm.edu/archs4
- Whole genomes of 150 Danish individuals: http://www.nature.com/nature/journal/v548/n7665/full/nature23264.html
- Migrating to GRCh38: https://software.broadinstitute.org/gatk/blog?id=8180
- Mappings between contig names in different assemblies: https://github.com/dpryan79/ChromosomeMappings
- Suffix arrays: http://almob.biomedcentral.com/articles/10.1186/s13015-016-0068-6
- gapped k-mer SVM: classifiers for DNA and protein sequences http://www.beerlab.org/gkmsvm/
- Version for large-scale data: https://github.com/Dongwon-Lee/lsgkm/
- Fast BWT creation: https://github.com/hitbc/deBWT
- Choosing assays based on complementarity to existing data: https://github.com/melodi-lab/Submodular-Selection-of-Assays
- MBRAnator: design of MPRA libraries https://www.genomegeek.com/
- http://bioconductor.org/packages/GenRank/
- Bayesian prioritizaiton of rare functional variants using RNA-seq data: https://github.com/ipw012/RIVER
- http://snp-nexus.org/IW-Scoring/
- NAR catalog of databases, by subject: http://www.oxfordjournals.org/our_journals/nar/database/c/
- Super-Enhancer Archive: http://www.bio-bigdata.com/SEA/
- GWAS database: http://jjwanglab.org/gwasdb
- rVarBase: regulatory features of human genetic variants http://rv.psych.ac.cn/
- TransVar http://bioinformatics.mdanderson.org/transvarweb/
- dbMAE: mono-allelic expression https://mae.hms.harvard.edu/
- Disease-gene associations http://www.disgenet.org/web/DisGeNET/menu/rdf
- BISQUE: convert between database identifiers http://bisque.yulab.org/
- Human tissue-specific enhancers: http://www.enhanceratlas.org/
- Searh HLI's genome data: hli-opensearch.com
- Chromatin-state annotations + per-base functionality scores for 164 cell types: http://noble.gs.washington.edu/proj/encyclopedia/
- Feature-based classification of human transcription factors: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1349-2
- Database of epifactors (epigenetic factors): http://epifactors.autosome.ru/
- Database of disease-associated methylation: http://202.97.205.78/diseasemeth/
- SRA metadata: http://deweylab.biostat.wisc.edu/metasra/
- Human histone modifications: http://www.tongjidmb.com/human/index.html
- iRegNet3D: SNP-focused catalog of TF-TF, TF-DNA, and DNA-DNA interactions http://iregnet3d.yulab.org/index/
- CTD: drug interactions and toxicity http://ctdbase.org/
- FANTOM lncRNA catalog: http://fantom.gsc.riken.jp/cat/
- Integrated database of public ChIP-Seq datasets: http://chip-atlas.org/
- Alternative splicing: http://vastdb.crg.eu/wiki/Main_Page
- Interactive multi-omics tissue assay database: https://ccb-web.cs.uni-saarland.de/imota/
- Database of cis-regulatory elements (enhancers): http://www.kostkalab.net/software.html
- Database of genetic variant effects on gene expression: https://xhaubem01.u.hpc.mssm.edu/gwas2genes/
- FASTA/FASTQ
- Read filtering and extraction: https://github.com/ad3002/Cookiecutter
- fqtools: for working with FASTQ files https://github.com/alastair-droop/fqtools
- Integrating results from multiple genome binning programs: https://github.com/songweizhi/Binning_refiner
- SAM/BAM/CRAM
- Read filtering and profiling: https://github.com/jwalabroad/VariantBam
- VCF
- Annotation: https://github.com/brentp/vcfanno
- https://github.com/lh3/bgt
- GQT
- BCFtools: includes new tool to identify RoH http://samtools.github.io/bcftools/
- Work with VCF in R: https://github.com/knausb/vcfR
- More VCF tools: http://vcf-kit.readthedocs.io/
- BED/GFF
- https://github.com/ihh/gfftools
- https://github.com/billzt/gff3sort
- Combine p-values https://github.com/brentp/combined-pvalues
- Assessing interval overlap of multiple genomic features: https://github.com/andrew-leith/GINOM
- http://rajanil.github.io/msCentipede/
- NucID: nucleosome positioning from DNase-seq https://jianlingzhong.github.io/NucID/
- SeqGL: predict TF binding from DNase/ATAC-seq https://bitbucket.org/leslielab/seqgl/wiki/Home
- DeFCoM: https://bitbucket.org/bryancquach/defcom
- General:
- R interface to DAVID: http://www.bioconductor.org/packages/release/bioc/html/RDAVIDWebService.html
- GSEA
- Cautions about the GSEA null model: http://bioinformatics.oxfordjournals.org/content/early/2017/01/02/bioinformatics.btw803.short
- GiANT: uncertainty in GSEA https://cran.r-project.org/web/packages/GiANT/index.html
- Ensemble gene set enrichment analysis: http://bioconductor.org/packages/release/bioc/html/EGSEA.html
- Fast GSA: https://github.com/billyhw/GSALightning
- Fast GSEA: https://github.com/ctlab/fgsea
- Multi-dimensional GSEA: http://bioconductor.org/packages/release/bioc/html/mdgsa.html
- GSEA with external information https://cran.r-project.org/web/packages/netgsa/index.html
- QTest: http://statgen.snu.ac.kr/software/QTest/
- Gene set analysis with specific alternative hypothesis: https://bioconductor.org/packages/release/bioc/html/GSAR.html
- clusterProfiler: https://guangchuangyu.github.io/clusterProfiler/
- DOSE: disease ontology https://guangchuangyu.github.io/dose/
- ReactomePA: https://guangchuangyu.github.io/reactomepa/
- https://cran.r-project.org/web/packages/SetRank/index.html
- Identify and rank significance of overlaps: https://github.com/ryanlayer/giggle
- Gene sets
- GO term:
- http://cbl-gorilla.cs.technion.ac.il/
- http://lrpath.ncibi.org/
- Reduce GO term lists:
- GO Express: https://www.bioconductor.org/packages/release/bioc/html/GOexpress.html
- GO Extender: https://www.msu.edu/~jinchen/GOExtender/
- http://iwera.ir/~ahmad/dal/
- Negative GO enrichment: https://sites.google.com/site/guoxian85/neggoa
- Variant Set
- https://cran.r-project.org/web/packages/VSE/vignettes/my-vignette.html
- Functional enrichment with LD correction
- MESH
- MeSH over-representation: http://www.bioconductor.org/packages/release/bioc/vignettes/meshr/inst/doc/MeSH.pdf
- meshes: https://guangchuangyu.github.io/meshes/
- Regional
- LOLA: http://lola.computational-epigenetics.org
- AnnotatR: https://github.com/rcavalcante/annotatr/
- Compare epigenetic features in multiple samples: http://epigenome.wustl.edu/EpiCompare1/
- Trait
- traseR: Trait-associated SNP enrichment https://www.bioconductor.org/packages/release/bioc/html/traseR.html
- Identifying genetic heterogeneity within phenotypically defined subgroups: https://github.com/jamesliley/subtest
- Multi-omics
- Single-sample GSA across data sets: https://www.bioconductor.org/packages/3.3/bioc/html/mogsa.html
- Network-based
- Association testing
- QTLtools: Pipeline for molecular QTL analysis https://qtltools.github.io/qtltools/
- FastQTL: http://fastqtl.sourceforge.net/
- RASQUAL: allele-specific QTL using phased SNPs - https://github.com/dg13/rasqual
- Relatedness, PCA: http://zhengxwen.github.io/SNPRelate/
- Multi-SNP, multi-trait regression https://github.com/ashlee1031/BERRRI
- regioneR: permutation testing for association between genomic region and phenotype http://bioconductor.org/packages/release/bioc/html/regioneR.html
- https://sites.google.com/site/honglee0707/mtg2
- Use local gene networks to improve trans-eQTL detection: https://github.com/PMBio/GNetLMM
- Fast correlation testing: https://github.com/gabraham/flashpca/tree/master/flashpcaR
- Gene and pathway association testing from summary statistics: https://cran.r-project.org/web/packages/aSPU/
- Account for LD and functional information: https://github.com/yjingj/SFBA
- Correcting for prediction uncertainty in TWAS: http://biorxiv.org/content/early/2017/02/14/108316
- Using random forests https://github.com/0asa/TTree-source
- Using k-mers: https://github.com/atifrahman/HAWK
- QTLtools: Pipeline for molecular QTL analysis https://qtltools.github.io/qtltools/
- Variance eQTL
- Multiple test correction
- eigenMT: Efficient multiple-test correction http://montgomerylab.stanford.edu/resources/eigenMT/eigenMT.html
- Fast multiple-test correction for LMMs: http://genetics.cs.ucla.edu/multiTrans/
- Hierarchical eQTL MTC: http://bioinformatics.org/treeqtl/
- Controlling bias in EWAS/TWAS using null distribution: http://bioconductor.org/packages/bacon/
- Prioritization
- GenoWAP: Prioritization of GWAS signals using functional information http://genocanyon.med.yale.edu/GenoWAP
- HitWalker2: https://github.com/biodev/HitWalker2
- https://nijingchao.github.io/CRstar/
- http://genetics.bwh.harvard.edu/pines/
- Fine-mapping
- Using summary statistics http://www.christianbenner.com
- http://bioinformatics.oxfordjournals.org/content/32/3/330.full
- PAINTOR: fine mapping, prioritization - https://github.com/gkichaev/PAINTOR_FineMapping/
- Genotype synthesis: https://sourceforge.net/projects/getsynth/
- Prediction of causal variants from epigenomic annotations: https://github.mit.edu/liyue/rivieraBeta
- DAP: Bayesian framework for QTL analysis and fine-mapping https://github.com/xqwen/dap
- BayesFM: https://sourceforge.net/projects/bayesfm-mcmc-v1-0/
- http://apps.biocompute.org.uk/haprap/
- Determining causal genes using TADs http://biorxiv.org/content/early/2016/11/15/087718
- Network analysis: https://github.com/YuanlongLiu/SigMod
- Haplotype-based: http://apps.biocompute.org.uk/haprap/
- LD score calculation and regression https://github.com/bulik/ldsc
- Imputation of missing phenotype information http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3513.html
- Epistasis
- General linear-mixed model library; also includes mixed-RF method for detecting epistasis with population structure correction: https://github.com/PMBio/limix
- GPU-accelerated detection of epistasis using Bayesian neural networks: https://github.com/beamandrew/BNN
- MEPID: marginal epistasis test http://www.xzlab.org/software.html
- Other
- Browser for geographical distribution of genetic variants: http://popgen.uchicago.edu/ggv/
- Integration of GWAS with molecular QTL: https://github.com/xqwen/integrative
- GxE interactions: https://github.com/davidaknowles/eagle
- Correct for batch effects between training data and external datasets: https://cran.r-project.org/web/packages/bapred/index.html
- Impute from Affy expression arrays: http://simtk.org/home/affyimpute
- SNP
- Call haplotypes https://cran.r-project.org/web/packages/GHap/index.html
- Methylation
- Minfi: R package for working with 450k methylation arrays
- D3M: two-sample test of differential methylation from distribution-valued data https://cran.r-project.org/web/packages/D3M/D3M.pdf
- Network-based approach to discovering epigenetic "modules" that can be associated with gene expression: http://bioinformatics.oxfordjournals.org/content/30/16/2360.long
- Filtering probes using technical replicates https://cran.r-project.org/web/packages/CpGFilter/index.html
- Imputation of genome-wide methylation: http://wanglab.ucsd.edu/star/LR450K/
- Tutorial for analysis using bioconductor packages: http://biorxiv.org/content/biorxiv/early/2016/05/25/055087.full.pdf
- Normalization
- Probe design bias correction: https://www.bioconductor.org/packages/release/bioc/html/ENmix.html
- DMR calling
- http://aminmahpour.github.io/PyMAP/
- Interactive exploration: http://bioconductor.org/packages/release/bioc/html/shinyMethyl.html
- eFORGE: identify cell type-specific signals in differentially methylated positions (mostly important for blood-based EWAS) http://eforge.cs.ucl.ac.uk/
- Model for EWAS using probe signal intensities: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1347-4
- Reference-based tissue deconvolution (three different algorithms): https://github.com/sjczheng/EpiDISH
- Glint pipeline (qc, EWAS, population structure): http://glint-epigenetics.readthedocs.io/en/latest/
- Bayesian extension of Refactor cell type heterogeneity correction that incorporates experimentally determined cell counts: https://github.com/cozygene/BayesCCE
- https://github.com/perishky/meffil/
- Fast, SNP-aware PWM matching https://www.cs.helsinki.fi/group/pssmfind/
- Cell type-specific TFBS analysis (focuses analysis on TFs expressed in cell type of interest): https://github.com/Danko-Lab/rtfbs_db
- Bayesian motif discovery: https://github.com/soedinglab/BaMMmotif
- R package for TFBS analysis: http://bioconductor.org/packages/release/bioc/html/TFBSTools.html
- FIMO and MCAST perform best of TFBS predictors: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1298-9
- Intra-motif dependencies
- Identifying http://www.jstacs.de/index.php/InMoDe
- Visualizing http://bioinformaticstools.mayo.edu/circularlogo/index.html
- Dinucleotide weight tensors encode dependencies between positions in motifs: http://dwt.unibas.ch/
- Circular logos: http://bioinformaticstools.mayo.edu/circularlogo/index.html
- Convert kernels learned by CNN to PWMs: ftp://ftp.cbi.pku.edu.cn/pub/software/CBI/k2p
- https://github.com/schulter/crbm
- Identification of module with biclustering: http://web.ist.utl.pt/rmch/bicnet/temporary/index.jsp
- Functional analysis https://bioconductor.org/packages/release/bioc/html/EGAD.html
- GAGE: http://bioconductor.org/packages/release/bioc/html/gage.html
- PAXToolsR: http://bioconductor.org/packages/release/bioc/html/paxtoolsr.html
- ancGWAS: post GWAS association with protein-protein interaction networks http://www.cbio.uct.ac.za/~emile/software.html
- Gene network reconstruction
- For a set of TFs: https://sourceforge.net/projects/aracne-ap/
- From PPI or motif sharing: https://github.com/davidvi/pypanda (is also an integrative method that can incorporate multiple sources of information)
- BANFF: https://cran.r-project.org/web/packages/BANFF/index.html
- https://bitbucket.org/abarysh/safe
- https://bitbucket.org/roygroup/merlin-p
- OSS alternative to Inginuity pathway analysis: https://www.bioconductor.org/packages/release/bioc/html/QuaternaryProd.html
- SAFE: spatial analysis of functional enrichment https://bitbucket.org/abarysh/safe
- Similarity search: https://github.com/zhangjiaobxy/nssrfPackage
- Tissue-specific: https://cran.r-project.org/web/packages/GRAPE/index.html
- CLR with B-Spline for mutual information-based inference of transcriptional regulatory networks: https://bitbucket.org/Jonathan-Ish-Horowicz/fastgenemi/
- Population history from unphased whole-genomes: https://github.com/popgenmethods/smcpp
- QTL
- Imputation of summary statistics in multi-ethnic cohorts: http://dleelab.github.io/jepegmix/
- Causal variant identification: http://genetics.cs.ucla.edu/caviar/
- eQTL
- Imputation of gene expression from genotype data : https://github.com/hriordan/PrediXcan
- Genetic risk
- Causal variant
- Ensemble method: https://github.com/gifford-lab/EnsembleExpr/
- eCAVIAR: probability that a variant is causal for both QTL and eQTL http://genetics.cs.ucla.edu/caviar/
- https://github.com/dleelab/qcat
- Disease-associated risk variants: https://sites.google.com/site/emorydivan/
- Predicting gene targets from GWAS summary statistics https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4979185/
- https://github.com/igm-team/orion-public
- Chromatin States
- GenoSTAN: http://bioconductor.org/packages/release/bioc/html/STAN.html
- R package for predicting chromatin states from histone marks across conditions https://github.com/ataudt/chromstaR
- Rule-based: http://www.statehub.org/
- Hierarchical HMM: https://github.com/gcyuan/diHMM
- https://github.com/calico/basenji
- https://bitbucket.org/roygroup/cmint
- Enhancers
- Prediction of enhancer strength from sequenced http://bioinformatics.hitsz.edu.cn/iEnhancer-2L
- Prediction of core cell type-specific TFs from super enhancers https://bitbucket.org/young_computation/crcmapper
- Prediction of superenhancers https://github.com/asntech/improse
- Deep learning-based: https://github.com/wenjiegroup/BiRen
- Coding mutations
- Predict mutation effects from sequence covariation: https://marks.hms.harvard.edu/evmutation/
- Regulatory variants/TF binding
- LedPred: prediction of regulatory sequences from ChIP-seq https://github.com/aitgon/LedPred
- GERV: prediction of regulatory variants that affect TF inding http://gerv.csail.mit.edu/
- Score variant deleteriousness: http://cadd.gs.washington.edu/
- BASSET: prediction of sequence activity https://github.com/davek44/Basset
- DanQ: hybrid convolutional and recurrent neural network model for predicting the function of DNA de novo from sequence http://github.com/uci-cbcl/DanQ
- LINSIGHT
- Protein binding affinity: https://bitbucket.org/wenxiu/sequence-shape.git
- Change in local frustration index: https://github.com/gersteinlab/frustration
- TFImpute: multi-task learning from ChIP-seq data across factors and tissues to impute TF binding for an unassayed tissue/factor combination: https://bitbucket.org/feeldead/tfimpute
- PPI https://www.ncbi.nlm.nih.gov/research/mutabind/index.fcgi/
- Multiple instance learning of TF binding: http://www.cs.utsa.edu/~jruan/MIL/
- Cell type-specific: https://github.com/uci-cbcl/FactorNet
- Predict TF binding from ATAC-Seq using deep neural network: https://github.com/hiranumn/deepatac
- Methylation
- CpGenie: predicts methylation from sequence, predicts impact of variants on nearby methylation https://github.com/gifford-lab/CpGenie
- Chromatin accessibility
- TFBS
- Predict TF binding affinities using open chromatin + PWMs: https://github.com/schulzlab/TEPIC
- LR-DNAse: TFBS prediction using features derived from DNase-seq: http://biorxiv.org/content/early/2016/10/24/082594
- Classification of cis-regulatory modules: https://github.com/weiyangedward/IMMBoost
- Imputation: https://github.com/tdurham86/PREDICTD
- Single cell
- Simultaneous RNA and methylation (and inference of CNV): http://www.nature.com/cr/journal/vaop/ncurrent/full/cr201623a.html
- Simultaneous RNA and methylation (scM&T-seq): http://www.nature.com/nmeth/journal/v13/n3/full/nmeth.3728.html
- Simultaneous RNA and protein measurements: http://www.sciencedirect.com/science/article/pii/S2211124715014345
- InterSIM: simulate correlated multi-omics data https://cran.r-project.org/web/packages/InterSIM/index.html
- WGSA pipeline https://sites.google.com/site/jpopgen/wgsa/
- http://snpeff.sourceforge.net/
- Normalization of SNP ID's from literature: https://github.com/rockt/SETH
- https://hail.is/
- Prediction of functional impact
- HaploReg: http://www.broadinstitute.org/mammals/haploreg/haploreg.php
- Several tools/score sets: CADD, DANN, etc
- Disease-specific functional prediction https://sites.google.com/site/emorydivan/
- Consensus approaches:
- Impact of coding SNPs: http://pantherdb.org/tools/csnpScoreForm.jsp
- Predict disease risk from GWAS summary statistics: https://github.com/yiminghu/AnnoPred
- http://queryor.cribi.unipd.it/cgi-bin/queryor/mainpage.pl
- Using epigenomic data https://github.com/mulin0424/cepip
- Tissue-specific https://github.com/kevinVervier/TiSAn
- VCF visualization with Circos plot: http://legolas.ariel.ac.il/~tools/CircosVCF/
- Google Genomics R API: https://followthedata.wordpress.com/2015/02/05/notes-on-genomics-apis-2-google-genomics-api/
- k-mer counting
- Density-based clustering: https://bitbucket.org/jerry00/densitycut_dev
- chopBAI: segment BAM indexes by region for faster access https://github.com/DecodeGenetics/chopBAI
- GFFutils: http://daler.github.io/gffutils/
- R package for aligned chromatin-oriented sequencing data: https://cran.r-project.org/web/packages/Pasha/
- MMR: resolve multi-mapping reads https://github.com/ratschlab/mmr
- BAMQL: query language for extracting reads from BAM files https://github.com/BoutrosLaboratory/bamql
- SAMBAMBA: samtools alternative
- BAMtools: another samtools alternative, plus some additional tools https://github.com/pezmaster31/bamtools/wiki
- DeepTools: more useful SAM/BAM operations http://deeptools.readthedocs.io/en/latest/content/list_of_tools.html
- bedtools http://bedtools.readthedocs.io/en/latest/
- bedops alternative/additional BED operations http://bedops.readthedocs.io/en/latest/
- Normalization:
- Demultiplexing/deduping barcoded reads w/ UMIs: http://gbcs.embl.de/portal/tiki-index.php?page=Je
- Hardware acceleration of alignment (requires $5k FPGA module): https://github.com/BilkentCompGen/GateKeeper
- Detection and removement of barcode swapping (issue on Illumina sequencers that used patterned flow cells: https://github.com/MarioniLab/BarcodeSwapping2017
- Data processing pipelines for many types of omics data, built using NextFlow and Singularity: https://github.com/c-guzman/cipher-workflow-platform
- Qualimap2: http://qualimap.bioinfo.cipf.es/
- Determine whether two BAM files are from the same source: https://bitbucket.org/sacgf/bam-matcher
- DOGMA: Measure completeness of a transcriptome or proteome assembly https://ebbgit.uni-muenster.de/domainWorld/DOGMA/
- Identify and remove UMI sequences from reads: https://github.com/CGATOxford/UMI-tools
- Integrated report from multiple tools: http://multiqc.info/
- Batch effects:
- https://github.com/mani2012/BatchQC
- Correct batch effects using residual neural net: https://github.com/ushaham/BatchEffectRemoval
- AlmostSignificant: https://github.com/bartongroup/AlmostSignificant
- Genetic relatedness from raw reads:
- Fast coverage estimate from BAM index: https://github.com/brentp/goleft/tree/master/indexcov
- Detecting sample swaps: https://github.com/PapenfussLab/HaveYouSwappedYourSamples
- QC Fail articles:
- Patterned flow cells (HiSeq 3000+) have high rates of optical duplicates: https://sequencing.qcfail.com/articles/illumina-patterned-flow-cells-generate-duplicated-sequences/
- http://samstat.sourceforge.net/
- Fingerprints: http://db.systemsbiology.net/gestalt/genome_fingerprints/
- http://fastq.bio/
- DNase footprinting: https://github.com/ajank/Romulus
- HINT: http://costalab.org/publications-2/dh-hmm/ (was best out of 10 compared tools in recent NatMeth paper)
- ALTRE: https://mathelab.github.io/ALTRE/vignette.html
- Predict TF binding affinities using open chromatin + PWMs: https://github.com/schulzlab/TEPIC
- LR-DNAse: TFBS prediction using features derived from DNase-seq: http://biorxiv.org/content/early/2016/10/24/082594
- Identify accessible chromatin from NOMe-seq https://sourceforge.net/projects/came/
- Nucleotide-specific bias adjustment: https://github.com/txje/sequence-bias-adjustment
- Pre-processing
- Quality assessment: https://github.com/rnakato/SSP
- Allocation of multi-mapping reads: https://github.com/keleslab/permseq
- Peak calling
- GC-aware peak caller: https://bioconductor.org/packages/devel/bioc/html/gcapc.html
- GenoGAM peak caller: https://master.bioconductor.org/packages/3.3/bioc/html/GenoGAM.html
- De-noising: https://github.com/kundajelab/TF_chipseq_pipeline
- Peak discretizer (merging of replicates): https://github.com/nanakiksc/zerone
- Compute error: https://github.com/tdhock/PeakError
- Specifically for histone modifications: https://github.com/Bohdan-Khomtchouk/SUPERmerge
- https://github.com/vidarmehr/ChIPWig
- hiddenDomains: https://sourceforge.net/projects/hiddendomains/
- Network-based identification of relationships among ChIP-seq data sets: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0925-0
- Motif assessment: http://www.bioinf.ict.ru.ac.za/
- https://github.com/cfce/chilin
- Web-based tool to compute enrichment at a variety of genomic features: http://liulab.dfci.harvard.edu/CEAS/
- Database of labeled ChIP-seq peaks: http://cbio.ensmp.fr/thocking/chip-seq-chunk-db/ (error of peak calls computed using https://github.com/tdhock/PeakError)
- EM algorithm for cooperatively bound TFs: https://github.com/vishakad/cpi-em
- Find different modes of binding: https://narlikarlab.github.io/DIVERSITY/
- Functional analysis
- Pipelines
- Shape motifs: https://github.com/h-samee/shape-motif
- K-mer-based alternative to PWMs for predicting TF binding sites: http://groups.csail.mit.edu/cgs/gem/kmac/
- Model 3D chromosome structure from Hi-C contact maps + optional FISH constraints: https://github.com/yjzhang/FISH_MDS.jl, https://github.com/yjzhang/3DC-Browser
- Predict enhancer targets: https://github.com/shwhalen/targetfinder
- Predicting TADs from histone modifications: https://cb.utdallas.edu/CITD/index.htm#ajax=home
- Filtering
- Removing redundant reads from deep sequencing data: https://git.informatik.uni-kiel.de/axw/Bignorm
- Error correction
- https://github.com/lh3/bfc
- RECKONER http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=reckoner&subpage=about
- MultiRes - says it's for viral populations; not sure if applicable to humans: https://github.com/raunaq-m/MultiRes
- Duplicate removal
- Alignment
- Align simultaneously against multiple reference genomes http://1001genomes.org/software/genomemapper.html
- Compressed reference-based alignment: http://groups.csail.mit.edu/cb/cora/
- Compression and querying of aligned haplotype data: https://github.com/richarddurbin/pbwt
- Graph-based (mainly for local realignment): https://github.com/ekg/glia
- Long read
- https://github.com/lh3/minimap2
- Python binding: https://pypi.python.org/pypi/mappy
- https://github.com/ocxtal/minialign
- Graph-based: https://github.com/isovic/graphmap
- https://github.com/xiaochuanle/MECAT
- Assembly
- Build de Bruijn Graph from multiple genomes: https://github.com/medvedevgroup/TwoPaCo
- Align to a de Brujn graph: https://github.com/Malfoy/BGREAT
- Cosmo/VARI: Assembler using succinct colored de Bruijn graphs to encode population information https://github.com/cosmo-team/cosmo/tree/VARI
- Streaming: https://github.com/Shamir-Lab/Faucet
- Variant calling
- Matching variant sets: https://github.com/medvedevgroup/varmatch
- Post-processing variant calls to determine whether variants at regions with alternative loci have allele(s) from an alternate locus: https://github.com/charite/asdpex
- Filtering low-frequency variants that likely result from DNA damage: https://github.com/eilslabs/DKFZBiasFilter
- Reference-free: https://github.com/dib-lab/kevlar
- Genotyping
- https://cran.r-project.org/web/packages/ebGenotyping/ebGenotyping.pdf
- http://bioinfo.ut.ee/FastGT/
- STR genotyping from NGS: http://melissagymrek.com/lobstr-code/
- Compression of genotype data: http://sun.aei.polsl.pl/REFRESH/gtc
- Base quality recalibration * https://github.com/swainechen/lacer
- SVs
- Score SVs based on predicted functional impact https://github.com/lganel/SVScore
- CNV calling
- https://github.com/cui-lab/multigems
- WHAM: CNV caller https://github.com/zeeev/wham
- Identification of mosic events: https://github.com/asifrim/mrmosaic
- Repeat calling
- REPdenovo: https://github.com/Reedwarbler/REPdenovo
- Pipelines
- SpeedSeq: alignment/annotation pipeline - https://github.com/hall-lab/speedseq
- Ancestry and kinship analysis
- Phasing
- Eagle2: https://data.broadinstitute.org/alkesgroup/Eagle/
- Using HiC+partial haplotypes: https://github.com/YakhiniGroup/SpectraPh
- PhaseME http://beehive.cs.princeton.edu/wiki/phaseme/
- HapCut2 (unclear if this works with standard Illumina WGS) https://github.com/vibansal/HapCUT2
- Other
- VCF compression and data extraction: https://github.com/kedartatwawadi/GTRAC
- Run length encoded multi-sample BWT + server: https://github.com/wtsi-svi/ReadServer
- Nanopore
- BS-SNPer: fast SNP calling from bisulfite-converted sequencing reads https://github.com/hellbelly/BS-Snper
- MACAU: Mixed-model association http://www.xzlab.org/software.html
- ME-plot: Error detection and correction in bisulfite-converted sequencing reads https://github.com/joshuabhk/methylsuite
- Metilene: Calling differential methylation http://www.bioinf.uni-leipzig.de/Software/metilene/
- Reference-free bisulfite sequence comparison: https://github.com/thomasvangurp/epiGBS
- Correction for cell-type composition: http://www.cs.tau.ac.il/~heran/cozygene/software/refactor.html
- Predicting gene expression from methylation: http://arxiv.org/abs/1603.08386
- GEM: R package for meQTL and EWAS https://bioconductor.org/packages/devel/bioc/html/GEM.html
- Call CNVs from methylation array data: https://github.com/mknoll/cnAnalysis450k
- Alternative/differential nuleosome positioning: https://github.com/airoldilab/cplate
- Annotations
- Reassembly and annotation of public data for multi-tissue transcriptome map: http://big.hanyang.ac.kr/CAFE
- Splice junctions
- Set of novel (i.e. missing from annotation databases) splice junctions identified from SRA datasets: https://github.com/nellore/intropolis/blob/master/README.md
- Complete sets of splice junctions in public RNA-seq datasets: http://snaptron.cs.jhu.edu/data/
- QC
- NOISeq - exploratory analysis of read mappings https://www.bioconductor.org/packages/release/bioc/html/NOISeq.html
- AuPairWise: determine replicability without replicates https://github.com/sarbal/AuPairWise
- dupRadar: duplications https://www.bioconductor.org/packages/release/bioc/html/dupRadar.html
- Correcting for RNA quality
- Align/quantify:
- Need to re-evaluate HiSat2 + StringTie pipeline https://github.com/gpertea/stringtie
- RapMap: stand-alone lightweight alignment library: https://github.com/COMBINE-lab/RapMap/tree/SAQuasiAlignment
- Kallisto is a fast and accurate method for transcript quantification https://pachterlab.github.io/kallisto/
- Sleuth is a companion R package for differential expression analysis http://pachterlab.github.io/sleuth/
- Different models can be used in Sleuth to, for example, perform time-course experiments http://nxn.se/post/134227694720/timecourse-analysis-with-sleuth
- tximport: R package for aggregating transcript-level quantifications for gene-level analysis: http://f1000research.com/articles/4-1521/v1
- Wasabi: prepare Salmon/Sailfish output for Sleuth https://github.com/COMBINE-lab/wasabi
- featureCounts: read summarization http://bioinf.wehi.edu.au/featureCounts/
- D-GEX: Quantification of whole-transcriptome gene expression from landmark genes https://github.com/uci-cbcl/D-GEX
- Faster version of HTSeq/featureCount: https://github.com/qinzhu/VERSE
- Quantification using both structure and abundance information: https://pypi.python.org/pypi/rsq
- Improve transcript quantification by integrating PolII ChIP-seq data: https://github.com/pliu55/RSEM/tree/pRSEM
- Hera: simultaneous alignment, quantification, and fusion detection https://github.com/bioturing/hera
- Aligner calibraiton: https://bitbucket.org/irenerodriguez/fbb
- Correction/Normalization
- Choosing normalization methods: https://arxiv.org/abs/1609.00959
- TDM: cross-platform normalization https://github.com/greenelab/TDMresults
- alpine: corrects for fragment sequence bias https://github.com/mikelove/alpine/blob/master/vignettes/alpine.Rmd
- Filter out lowly-expressed transcripts prior to quantification decreases FP rate: http://www.genomebiology.com/2016/17/1/12
- Partition variance between biological and technical sources: https://www.bioconductor.org/packages/3.3/bioc/vignettes/variancePartition
- Quantify and correct for uncertainty in abundance estimates: https://github.com/PSI-Lab/BENTO-Seq
- Simultaneous isoform discovery and quantification across multiple samples: http://cbio.ensmp.fr/flipflop
- R package to compare normalization methods: https://github.com/Edert/NVT
- Filtering and tissue-aware normalization: http://bioconductor.org/packages/release/bioc/html/yarn.html
- Bias correction for transcript abundance estimation: https://www.lexogen.com/mix-square-scientific-license/
- Replacement for htseq-counts/featureCounts that handles multi-mapping reads: https://bitbucket.org/mzytnicki/multi-mapping-counter
- Workflows:
- Artemis (RNA-Seq workflow designed around Kallisto): https://github.com/RamsinghLab/artemis
- https://github.com/ririzarr/rafalib
- Isolator: https://github.com/dcjones/isolator
- https://bioinform.github.io/rnacocktail/
- eQTL
- Multi-tissue:
- HT-eQTL https://github.com/reagan0323/MT-eQTL
- MT-HESS: eQTL analysis across tissues http://www.mrc-bsu.cam.ac.uk/software/
- eQTLBMA: cross-tissue eQTL https://github.com/timflutre/eqtlbma
- Multi-tissue eQTL: https://cran.r-project.org/web/packages/JAGUAR/index.html
- Multi-tissue, polygenic TWAS: https://github.com/ypark/fqtl
- Quantile regression approach: https://xiaoyusong.shinyapps.io/QRBT/
- Using prior knowledge: https://github.com/redsofa/LassoMP
- CONDOR: simultaneous cis- and trans-eQTL analysis https://github.com/jplatig/condor
- log allelic fold change (aFC) to quanitfy effect sizes of eQTL: http://biorxiv.org/content/biorxiv/early/2016/09/30/078717.full.pdf
- Identify cis mediators of trans-eQTL: http://biorxiv.org/content/early/2016/09/30/078683
- https://cran.r-project.org/web/packages/QRank/
- Multi-tissue:
- Differential expression:
- cjBitSeq: https://github.com/mqbssppe/cjBitSeq/wiki
- Differential junction usage: https://github.com/hartleys/JunctionSeq
- TROM: comparison of transcriptomes between species (and maybe between cell/tissue types?) https://cran.r-project.org/web/packages/TROM/index.html
- Tissue specificity of genes, based on GTEx data: http://genetics.wustl.edu/jdlab/tsea/
- Informative priors for Bayesian differential expression analysis using historical data: https://github.com/benliemory/IPBT
- RNA-enrich: http://lrpath.ncibi.org/
- Diferentially expressed region finder (also works for ChIP-seq peaks): www.bioconductor.org/packages/derfinder
- Differentially expressed pathways using kernel MMD: https://eib.stat.ub.edu/tiki-index.php?page_ref_id=73
- Method using dimension-reduced ANOVA: http://homepage.fudan.edu.cn/zhangh/softwares/multiDE/
- Correct for hidden sources of variation: https://github.com/sutigit21/SVAPLSseq
- Alternative method for estimating variances: https://github.com/mengyin/vashr
- With few replicates: https://figshare.com/s/963e895f812d6f06468a
- Network sub-pathways: http://bioconductor.org/packages/release/bioc/html/DEsubs.html
- https://github.com/ewyang089/SDEAP/wiki
- Local subnetworks enriched for DE genes: https://cran.rstudio.com/web/packages/LEANR/index.html
- https://github.com/beiyuanzhe/DiscriminantCut
- Stage-wise DE/DT https://github.com/statOmics/stageR
- Computing heritability of gene expression: https://cran.r-project.org/web/packages/HeritSeq/index.html
- https://github.com/bee-hive/BIISQ
- Power analysis: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1648-2
- Using a beta binomial model with dynamic correction for overdispersion: https://github.com/GuoshuaiCai/BBDG
- Kmer-based: https://github.com/Transipedia/dekupl
- Differential transcript usage
- https://github.com/bartongroup/Rats
- Bayesian extenstion to BitSeq for differential transcript usage: https://github.com/mqbssppe/cjBitSeq
- Co-expression/networks
- http://biorxiv.org/content/early/2016/10/02/078741
- BicMix: differential co-expression networks http://beehive.cs.princeton.edu/software/
- ASE
- GeniASE: ASE without haplotypes http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4758070/pdf/srep21134.pdf
- Without phasing: http://lifecenter.sgst.cn/cisASE/
- Splicing
- https://github.com/hartleys/JunctionSeq
- https://github.com/lkmklsmn/SplicER
- Annotation of splicing types https://r-forge.r-project.org/projects/splicingtypes/
- MAJIQ: detection of local splice variation http://majiq.biociphers.org/
- https://github.com/davidaknowles/leafcutter
- http://www.mhs.biol.ethz.ch/research/krek/jsplice.html
- Splice site prediction: http://cabgrid.res.in:8080/HSplice/
- DRIM-seq: http://bioconductor.org/packages/release/bioc/html/DRIMSeq.html
- Fast quantification of differential splicing: https://github.com/comprna/SUPPA
- Identify variant associated with splicing: https://sourceforge.net/projects/isvase/
- Prediction of intronic splice branchpoints: https://github.com/betsig/branchpointer/
- Proportion spliced index: https://github.com/comprna/Junckey
- https://github.com/timbitz/Whippet.jl
- Assembly:
- CIDANE: http://ccb.jhu.edu/software/cidane/
- transrate: evaluation of de novo assemblies http://hibberdlab.com/transrate/
- dammit: annotator for de novo assemblies http://dammit.readthedocs.org/en/latest/
- kma: detection of differential intron retention https://github.com/pachterlab/kma
- RapClust: lightweight clustering of de novo transcriptomes https://pypi.python.org/pypi/rapclust/0.1
- Shannon http://sreeramkannan.github.io/Shannon/
- Strawberry https://github.com/ruolin/Strawberry
- CLASS2: http://ccb.jhu.edu/people/florea/research/CLASS2/
- https://sourceforge.net/projects/transcriptomeassembly/files/
- Multi-sample transcriptome assembly: http://tacorna.github.io/
- Clustering to decontaminate de novo assemblies: https://github.com/Lafond-LapalmeJ/MCSC_Decontamination
- Assembly from unstranded data: http://big.hanyang.ac.kr/CAFE
- https://github.com/Kingsford-Group/scallop
- Identification of transcript boundaries: https://github.com/realbigws/DeepBound
- Corset: gene counts from a transcriptome assembly https://github.com/Oshlack/Corset/wiki
- Consensus method: https://github.com/macmanes-lab/Oyster_River_Protocol
- Time series
- Deconvolution:
- VoCAL: https://cran.r-project.org/web/packages/ComICS/index.html
- DeconRNASeq: deconvolute expression profiles in mixed tissues http://www.bioconductor.org/packages/2.12/bioc/vignettes/DeconRNASeq/inst/doc/DeconRNASeq.pdf
- Search
- Structural variation
- Squid: https://github.com/Kingsford-Group/squid
- Identify gene fusions: https://github.com/ndaniel/fusioncatcher
- Fusion genes: http://star-fusion.github.io
- Identify gene expression driven by copy number alteration in samples with matched RNA-seq and CNA data: https://www.bioconductor.org/packages/release/bioc/html/iGC.html
- Fusion identification using kallisto: https://github.com/pmelsted/pizzly
- Other
- Biclustering for gene co-expression analysis: http://bioconductor.org/packages/devel/bioc/html/QUBIC.html
- Sample size calculation for experimental design: https://cran.r-project.org/web/packages/ssizeRNA/index.html
- Variance estimation: http://github.com/mengyin/vashr
- Sample expression "admixture" (can also be used for deconvolution): https://www.bioconductor.org/packages/release/bioc/html/CountClust.html
- Essentially, this assigns samples to clusters based on similarity in expression of sets of genes determined be be most discriminating. Each sample can belong to multiple clusters (similar to an admixture analysis).
- Identification of regulatory networks: https://sites.google.com/a/fleming.gr/rnea/
- Predict ribosome footprint from transcripts https://sourceforge.net/projects/riboshape/
- Phasing: https://github.com/secastel/phaser
- Identifying the source of (almost) all RNA-seq reads: https://github.com/smangul1/rop/wiki
- Subsampling to determine effect of read depth on downstream analyses: http://www.bio-complexity.com/samExploreR_1.0.0.tar.gz
- Phenotype prediction: https://github.com/clabuzze/Phenotype-Prediction-Pipeline
- Predict RNA-RNA interaction: https://github.com/satoken/ractip
- Mitigate cell-cycle effects: http://www.nature.com/articles/srep33892
- Fast computation of probabilities of pairwise regulation https://github.com/lingfeiwang/findr
- Align against synthetic transcript-based reference: https://github.com/Oshlack/Lace
- Interactive visualization http://bioconductor.org/packages/devel/bioc/html/Glimma.html
- New factorization method for dimensionality reduction: https://github.com/brian-cleary/CS-SMAF
- Database of public RNA-seq data sets processed using the same pipeline: https://github.com/mskcc/RNAseqDB
- Evaluation of aligners on long reads (GMap performs best): http://biorxiv.org/content/early/2017/04/11/126656
- Test different ML algorithms for classifying expression profiles: https://github.com/gboris/blkbox
- Find regions of correlated expression: https://cran.r-project.org/web/packages/SegCorr/index.html
- Comparative analysis of methods:
- Review of experimental design and analysis: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0927-y
- Sean Davis' list: https://github.com/seandavi/awesome-single-cell
- Datasets:
- scRNAseqDB: https://bioinfo.uth.edu/scrnaseqdb/
- Analysis-ready datasets: http://imlspenticton.uzh.ch:3838/conquer/
- Platforms
- Microwells: http://www.nature.com/articles/srep33883
- QC:
- http://www.morgridge.net/SinQC.html
- https://github.com/YosefLab/scone
- Parallel transcriptome/epigenome data: https://github.com/ChengchenZhao/DrSeq2
- Normalization
- http://michelebusby.tumblr.com/post/130202229486/the-ks-test-looks-pretty-good-for-single-cell
- Accounting for technical variation: http://www.nature.com/ncomms/2015/151022/ncomms9687/full/ncomms9687.html#supplementary-information
- ZIFA: Zero-inflated factor analysis https://github.com/epierson9/ZIFA
- http://biorxiv.org/content/early/2016/04/22/049734.full.pdf+html
- scran: http://bioconductor.org/packages/devel/bioc/html/scran.html
- Correct for expression heterogeneity: https://github.com/PMBio/scLVM
- Comparison of normalization methods: http://biorxiv.org/content/biorxiv/early/2016/07/17/064329.full.pdf
- https://github.com/rhondabacher/SCnorm/tree/master/R
- Factor analysis: https://github.com/UcarLab/IA-SVA/
- qSVA corrects for RNA quality (in the SVA package)
- ERCC https://bitbucket.org/bsblabludwig/bearscc
- Gene/Transcript counting
- Cell type-specific expression
- Clustering
- Comparative analysis:
- SC3: consensus clustering https://github.com/hemberg-lab/sc3
- destiny: diffusion maps for single-cell data http://bioconductor.org/packages/release/bioc/html/destiny.html
- https://github.com/govinda-kamath/clustering_on_transcript_compatibility_counts
- GiniClust https://github.com/lanjiangboston/GiniClust
- pcaReduce: https://github.com/JustinaZ/pcaReduce
- https://github.com/BatzoglouLabSU/SIMLR
- CIDR: https://github.com/VCCRI/CIDR
- Vortex: http://web.stanford.edu/~samusik/vortex/
- Identify rare cell types: RaceID http://www.nature.com/nature/journal/v525/n7568/full/nature14966.html
- https://github.com/VCCRI/CIDR
- https://github.com/drisso/zinbwave
- http://www.pitt.edu/~wec47/singlecell.html
- Neural networks for dimensionality reduction and clustering http://sb.cs.cmu.edu/scnn/
- https://github.com/srmcc/dcss_single_cell
- Cell similarity measure: https://github.com/maggiecrow/MetaNeighbor
- Differential Expression
- Monocle cole-trapnell-lab.github.io/monocle-release/ (2.0 has Census algorithm for differential transcript analysis)
- scDD: https://github.com/kdkorthauer/scDD
- ISOP: comparison of isoform pairs in single cells https://github.com/nghiavtr/ISOP
- D3E: http://hemberg-lab.github.io/D3E/
- BASiCS: https://github.com/catavallejos/BASiCS
- Beta Poisson: https://github.com/nghiavtr/BPSC
- Zero-inflation correct enables use of DESeq2, etc w/ single cell data: https://github.com/statOmics/zingeR
- DESingle: https://github.com/miaozhun/DEsingle
- https://github.com/willtownes/vamf-paper
- Allele-specific expression
- SCALE accounts for "burstiness" of transcription: https://github.com/yuchaojiang/SCALE
- Splicing
- Time-series/ordering/lineage prediction
- Monocle
- Analysis of pseudotime uncertainty: http://biorxiv.org/content/biorxiv/early/2016/04/05/047365.full.pdf
- ECLAIR: cell lineage prediction https://github.com/GGiecold/ECLAIR
- Identification of ordering effects: https://github.com/lengning/OEFinder
- Slicer: non-linear trajectories https://github.com/jw156605/SLICER
- Wishbone: identification of bifurcations in developmental trajectories http://www.c2b2.columbia.edu/danapeerlab/html/cyt-download.html
- SCOUP: https://github.com/hmatsu1226/SCOUP
- Ouija: https://github.com/kieranrcampbell/ouija
- http://bioconductor.org/packages/release/bioc/html/sincell.html
- https://github.com/kstreet13/slingshot
- Cytoscape plugin: http://cytospade.org/
- https://github.com/zji90/TSCAN
- https://github.com/dimenwarper/scimitar
- https://cran.r-project.org/web/packages/timeSeq/index.html
- http://bioconductor.org/packages/release/bioc/html/cellTree.html
- Diffusion pseudiotime: http://www.helmholtz-muenchen.de/icb/research/groups/machine-learning/projects/dpt/index.html
- https://github.com/theislab/kbranches
- Construction co-expression networks: https://cran.r-project.org/web/packages/LEAP/index.html
- Differentila expression between trajectories https://github.com/kieranrcampbell/switchde
- FORKS: https://github.com/macsharma/FORKS
- https://github.com/kieranrcampbell/phenopath
- Monocle
- Pipelines
- Seurat http://www.satijalab.org/seurat.html
- SINCERA https://research.cchmc.org/pbge/sincera.html
- MAST: https://github.com/RGLab/MAST
- scde (differential expression + gene set over-dispersion): https://github.com/hms-dbmi/scde
- BaSiCs: Bayesian analysis of single cell data: https://github.com/catavallejos/BASiCS
- FastProject: https://github.com/YosefLab/FastProject/wiki
- Citrus: http://chenmengjie.github.io/Citrus/
- Tools from Teichmann lab (cellity, celloline, scrnatb): https://github.com/Teichlab/
- SCell: https://github.com/diazlab/SCell
- http://bioconductor.org/packages/scater
- https://github.com/joeburns06/hocuspocus
- https://gitlab.com/uhcclxgg/granatum
- https://github.com/LuyiTian/scPipe
- For epigenetic data: https://zhiji.shinyapps.io/scrat/
- scanpy: https://github.com/theislab/scanpy
- SNVs/CNVs
- DNA SNV calling: https://bitbucket.org/hamimzafar/monovar
- Ginko: analysis of CNVs in single-cell data: http://qb.cshl.edu/ginkgo/?q=/XWxZEerqqY477b9i4V8F
- CNV calling: http://genome.cshlp.org/content/early/2016/01/15/gr.198937.115.full.pdf
- Genotyping: https://bitbucket.org/aroth85/scg/wiki/Home
- Regulatory networks:
- Other scRNA-seq
- Analysis of 3' tagging data: https://github.com/garber-lab/ESAT
- StemID: Prediction of stem cells and lineage information https://github.com/dgrun/StemID
- Phasing: https://github.com/edsgard/scphaser
- Classification using sets of known cell type-specific genes: https://github.com/YosefLab/FastProject/wiki
- UMI counting: https://github.com/vals/umis
- Imputation of missing values
- https://github.com/Vivianstats/scImpute
- https://cran.r-project.org/web/packages/DrImpute/index.html
- Power analysis: https://github.com/vals/umis/
- Pooled perturbation experiments: https://github.com/asncd/MIMOSCA
- Simulation: http://bioconductor.org/packages/splatter/
- Comparison across experiments: https://github.com/hemberg-lab/scmap
- Demuxlet: using natural genetic variation to demultiple 10x data https://github.com/hyunminkang/apigenome
- Alignment of multiple single-cell data sets: https://github.com/jw156605/MATCHER
- Topological analysis: https://github.com/RabadanLab/scTDA
- Methylation
- Prediction of missing information: https://github.com/cangermueller/deepcpg
- ATAC-seq
- Infer TF variation: https://github.com/GreenleafLab/chromVAR
- Clustering: https://github.com/timydaley/scABC
- Reviews:
- http://www.biomedcentral.com/1752-0509/8/S2/I1
- Comparison of methods http://www.biomedcentral.com/1752-0509/8/S2/S4
- Focusing on integration of RNA-seq and ChIP-seq http://journal.frontiersin.org/article/10.3389/fcell.2014.00051/full
- Network-based methods: http://rsif.royalsocietypublishing.org/content/12/112/20150571
- General-purpose multi-omics integration:
- mixOmics: R package that implements several multivariate methods, including DIABLO http://mixomics.org/
- Non-negative matrix factorization for integration of data sets https://github.com/yangzi4/iNMF
- omicade4: Integration of multi-omics data using co-inertia analysis https://bioconductor.org/packages/release/bioc/html/omicade4.html
- Integration of multi-omics data using random forests: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1043-4
- Kernel-PCA: http://www.biomedcentral.com/1752-0509/8/S2/S6
- Integrative analysis of multiple diverse omics datasets by sparse group multitask regression http://journal.frontiersin.org/article/10.3389/fcell.2014.00062/abstract
- Joint bi-clustering of multiple data types: http://research.cs.aalto.fi/pml/software/GFAsparse/
- Identifying covariance between sequencing data sets: http://github.com/pmb59/fCCAC/
- https://github.com/davidvi/pypanda
- SDA: integrate gene expression across multiple tissues, or multi-omics in a single tissue, for identification of trans-QTL networks: https://jmarchini.org/sda/
- HMM for binary classification based on multivariate data: https://github.com/PetarV-/muxstep
- https://github.com/fraenkel-lab/OmicsIntegrator
- Correlation between enriched regions from different data sets: http://malone.bioquant.uni-heidelberg.de/software/mcore
- https://cran.r-project.org/web/packages/r.jive/
- SIFORM: http://bioinformatics.oxfordjournals.org/content/early/2016/07/03/bioinformatics.btw295.full
- https://sourceforge.net/projects/epimine/
- Deep learning-based framework for integrating multiple data types to predict another data type: https://github.com/ueser/FIDDLE
- Significance-based https://www.bioconductor.org/packages/release/bioc/html/SMITE.html
- Two-stage CCA identifies non-linear associations: https://github.com/kosyoshida/TSKCCA
- HMM: https://link.springer.com/protocol/10.1007/978-1-4939-6753-7_10
- https://github.com/KnowEnG/pgenmi
- Multi-tissue:
- Specific data types:
- Predict gene fusions from WGS and RNA-seq: http://sourceforge.net/p/integrate-fusion/wiki/Home/
- NuChart: layer additional omics data on Hi-C ftp://fileserver.itb.cnr.it/nuchart/
- Predict expression from H3K27, and identify cis-regulatory elements: http://cistrome.org/MARGE/
- GenoSkyline: predict tissue-specific functional regions from epigenomic data http://genocanyon.med.yale.edu/GenoSkyline
- Network-based:
- Merging networks: https://github.com/maxconway/SNFtool
- https://sourceforge.net/projects/xmwas/
- Causality
- Hybrid BN/CMI approach to constructing GRNs: http://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1005024
- Noise/bias:
- Bias correction across different assays on the same samples: https://cran.r-project.org/web/packages/MANCIE/
- Cell cycle heterogeneity appears to be a minor contributor to noise; instead, library size is the largerst PC by far http://www.nature.com/nbt/journal/v34/n6/full/nbt.3498.html
- Patient/disease subtyping
- Imputation of missing data
- http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1273-5
- TensorPute: multi-dimensional imputation https://sites.google.com/site/tensortest2/
- Other
- Identify class-descriminative motifs enriched in subclasses of overlapping annotations: https://github.com/seqcode/sequnwinder
- R class for integration algorithms: https://bioconductor.org/packages/release/bioc/html/MultiDataSet.html
- Generate data type-specific compression formats: http://algorithms.cnag.cat/cargo/
- Protocol buffers
- Protobuf: fast cross-language/platform serialization of fixed-format messages https://developers.google.com/protocol-buffers/
- https://capnproto.org/
- IDEs
- VisualStudio (now free): https://www.visualstudio.com/vs/visual-studio-mac/
- Parameter optimization
- Diff/patch/merge for data tables
- Pipe output of a shell command to a website (unfortunately can't be used in NIH HPC since nodes do not allow network connections): https://seashells.io/
- Debugging
- Sandsifter: Fuzzer https://github.com/xoreaxeaxeax/sandsifter
- JSON Diff: http://www.jsondiff.com/
- kmers
- Streaming kmer counting https://github.com/bcgsc/ntCard
- kmer bloom filters: https://github.com/Kingsford-Group/kbf
- High-performance concurrent hash table (C++11): https://github.com/efficient/libcuckoo
- BWT that incorporates genetic variants: https://github.com/iqbal-lab/gramtools
- Fast bitwise operations on nucleotide sequences: https://github.com/kloetzl/biotwiddle
- C++ interface to htslib, BWA-MEM, and Fermi (local assembly) (would be useful to build python bindings for this): https://github.com/walaj/SeqLib
- Minimal perfect hash function: fast, large data sets https://github.com/rizkg/BBHash
- Counting quotient filter: https://github.com/splatlab/cqf
- Succinct de Bruijn Graphs: http://alexbowe.com/succinct-debruijn-graphs/
- Blazing signature filter: fast pairwise comparison of e.g. gene expression matrices https://github.com/PNNL-Comp-Mass-Spec/bsf-py
- Tidy data cheatsheet: http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
- Multiple variable assignment: https://github.com/nteetor/zeallot
- https://github.com/qinwf/awesome-R#integrated-development-environment
- http://www.computerworld.com/article/2497464/business-intelligence/business-intelligence-60-r-resources-to-improve-your-data-skills.html
- http://dirk.eddelbuettel.com/cranberries/
- METACRAN: identify R packages http://www.r-pkg.org/
- MonetDB - embeddable column-store DB with R integration (MonetDB.R)
- daff: diff/merge for data frames - https://github.com/edwindj/daff
- dplyr
- chunked processing of large files: https://github.com/edwindj/chunked
- magrittr/pipes
- debugging: https://github.com/gaborcsardi/tamper
- Join tables on inexact matching: https://github.com/dgrtwo/fuzzyjoin
- Tidy text: http://juliasilge.com/blog/Life-Changing-Magic/
- MarkDeep for R documentation: http://casual-effects.com/markdeep/
- https://confluence.broadinstitute.org/display/GDAC/Nozzle
- Access to Google spreadsheets from R: https://github.com/jennybc/googlesheets
- Advanced table formatting in knitr: https://github.com/renkun-ken/formattable
- Access data frames using SQL: sqldf package
- Developing R packages: https://github.com/jtleek/rpackages
- Work with PDF files: https://cran.r-project.org/web/packages/pdftools/index.html
- Language-agnostic data frame format: https://github.com/wesm/feather
- Make for R: https://github.com/richfitz/maker
- Find root of current package: https://krlmlr.github.io/here/
- Data structures/formats
- Chunked, compressed, disk-based arrays: https://github.com/alimanfoo/zarr
- Tabular data
- Working with tabular data: http://docs.python-tablib.org/en/latest/
- Watch for Apache Arrow
- Pandas
- GFA: https://github.com/ggonnella/gfapy/tree/master/gfapy
- Pipelines
- Invoke: http://docs.pyinvoke.org/en/latest/
- Toil: http://toil.readthedocs.io/en/latest/installation.html
- Snakemake
- A regular expression scanner: https://github.com/mitsuhiko/python-regex-scanner
- API for interacting with databases: https://github.com/kennethreitz/records
- RStudio for python: https://www.yhat.com/products/rodeo
- boltons.debugutils: The entire boltons package has lots of useful stuff, but debugutils is particularly cool - you can add one line of code to enable you to drop into a debugger on signal (e.g. Ctrl-C): https://boltons.readthedocs.io/en/latest/debugutils.html
- Stats
- Non-negative matrix factorization: https://github.com/ccshao/nimfa
- http://www.statsmodels.org/stable/index.html
- R formulas in python: https://github.com/pydata/patsy
- pyrasite: code injection into running applications
- Dexy: documentation
- Event loops for asynchronous programming
- curio
- gevent
- Fast microservices: https://github.com/squeaky-pl/japronto
- dill: alternative serialization
- arrow: alternative to datetime
- Template for scientific projects: https://github.com/uwescience/shablona
- FFI
- GO transplier: https://github.com/google/grumpy
- Calling Rust libraries from python: https://medium.com/@caulagi/complementing-python-with-rust-657a8cb3d066#.6in8v0bte
- pyjamas: javascript bridge
- Disabling python garbage collection speeds up programs: https://engineering.instagram.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172#.ri55nyjdu (only safe when the lifecycle is straight-forward for all objects, an thus reference counting is sufficient for memory management)
- Easily implementing function proxies/wrappers: http://wrapt.readthedocs.io/en/latest/
- Cache system: https://bitbucket.org/zzzeek/dogpile.cache
- Parse TOML (an enhanced config-file spec): https://github.com/uiri/toml
- Web scraping
- Visualize python code execution time as a heatmap in a Jupyter notebook: https://github.com/csurfer/pyheatmagic
- Spark:
- Ibis: http://blog.cloudera.com/blog/2015/07/ibis-on-impala-python-at-scale-for-data-science/
- Petuum: http://petuum.github.io/
- Flink: https://flink.apache.org/
- Dask: http://dask.pydata.org/en/latest/
- Efficient tabular storage: http://matthewrocklin.com/blog/work/2015/08/28/Storage/
- Common runtime for various libraries (e.g. Pandas, TensorFlow) that speeds up execution when interfacing the libraries with each other: https://weld-project.github.io/
- Diff tables: https://github.com/paulfitz/daff
- Miller - work with tables http://johnkerl.org/miller/doc/build.html
- CockroachDB: based on Google's distributed database https://github.com/cockroachdb/cockroach
- In-memory key-value db in python: https://github.com/paxos-bankchain/subconscious
- http://rstudio.github.io/packrat/walkthrough.html
- Docker:
- http://arxiv.org/pdf/1410.0846v1.pdf
- http://bioboxes.org/available-bioboxes/
- http://ivory.idyll.org/blog//2015-docker-and-replicating-papers.html
- GUI for running Docker images locally: https://kitematic.com/
- NextFlow: http://www.nextflow.io/
- Jupyter notebooks:
- http://jupyter.org/
- http://mybinder.org/
- http://nwhitehead.github.io/pineapple/
- RISE: presentations from Jupyter notebooks https://github.com/damianavila/RISE
- Stencila: interesting alternative to Jupyter notebooks and R markdown https://stenci.la/
- Bioinformatics software containers
- Continuous analysis: https://github.com/greenelab/continuous_analysis
- Jupyter notebooks on Azure: https://notebooks.azure.com/
- Creating reproducible workflows with R Markdown documents: https://jdblischak.github.io/workflowr/
- Luigi https://github.com/spotify/luigi
- Flo http://flo.readthedocs.org/en/latest/index.html
- Qsubsec: template language for defining SGE workflows https://github.com/alastair-droop/qsubsec
- Nextflow and Nextflow Workbench: http://campagnelab.org/software/nextflow-workbench/
- SUSHI: https://github.com/uzh/sushi
- https://github.com/jdblischak/workflowr
- Data manager for R: https://cran.r-project.org/web/packages/repo/index.html
- SnakeChunks: components for SnakeMake https://github.com/SnakeChunks/SnakeChunks
- dgsh: bash variant that has primitives for parallelizing tasks https://www2.dmst.aueb.gr/dds/sw/dgsh
- Search for papers: https://www.semanticscholar.org/
- Common probability distributions http://blog.cloudera.com/blog/2015/12/common-probability-distributions-the-data-scientists-crib-sheet/
- How to share data with a statistician: https://github.com/jtleek/datasharing
- Precision-recall curves:
- R package for subsetting data into training/testing/validation sets: https://cran.r-project.org/web/packages/STPGA/index.html
- Lists
- Decision tree methods
- R randomForest package
- FuzzyForests are an extension of random forests for classification in which subsets of variables/features are highly correlated https://github.com/OHDSI/FuzzyForest
- https://github.com/catboost
- https://github.com/kundajelab/boosting2D/
- New performance metric: https://cran.r-project.org/web/packages/IPMRF/IPMRF.pdf
- Modrian forests https://scikit-garden.github.io/examples/MondrianTreeRegressor/
- Clustering
- Help selecting biclustering algorithm: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1487-1
- DBScan https://cran.r-project.org/web/packages/dbscan/dbscan.pdf
- https://cran.r-project.org/web/packages/KODAMA/index.html
- t-SNE: alternative to PCA and MDS: http://lvdmaaten.github.io/tsne/
- http://distill.pub/2016/misread-tsne/
- Compressive k-means: https://arxiv.org/pdf/1610.08738.pdf
- Sparse convex: https://arxiv.org/pdf/1601.04586.pdf
- Multivariate analysis http://cran.r-project.org/web/views/Multivariate.html
- Multivariate analysis of covariance (MANCOVA): http://en.wikipedia.org/wiki/MANCOVA
- Nonnegative Matrix Factorization: https://cran.r-project.org/web/packages/NMF/vignettes/NMF-vignette.pdf
- Tensor factorization: https://cran.r-project.org/package=tensorBF
- Identification of correlated features within or between datasets: https://github.com/siskac/discordant
- Bayesian alternatives to standard R functions: https://github.com/rasmusab/bayesian_first_aid
- Bayesian regression modeling:
- brms and rstanarm are R packages based on stan
- JAGS http://jeromyanglim.blogspot.com/2012/04/getting-started-with-jags-rjags-and.html
- MCMC http://www.stat.umn.edu/geyer/mcmc/library/mcmc/doc/demo.pdf
- Multiple test correction
- FDR for multi-dimensional pairwise comparisons (e.g. RNA-seq): http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0937-5
- Local false signal rate: an alternative to FDR that operates on standard error estimates rather than p-values: https://github.com/stephens999/ashr
- MTC weighted by variant: effect http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3507.html
- Parallelizable FDR correction: http://bioinformatics.oxfordjournals.org/content/early/2016/02/25/bioinformatics.btw029.short
- IHW: http://bioconductor.org/packages/release/bioc/html/IHW.html
- BonEV: Bonferonni FDR correction https://cran.r-project.org/web/packages/BonEV/index.html
- Analysis of mutual information: https://github.com/jkleinj/arMI
- Fast Bayesian alternative to lasso and ElasticNet for feature selection and effect estimation: https://cran.r-project.org/web/packages/EBglmnet/index.html
- Genome-wide generalized addative models: https://master.bioconductor.org/packages/3.3/bioc/html/GenoGAM.html
- Causal inference test: https://cran.r-project.org/web/packages/cit/index.html
- Iterative denoising tree: https://github.com/youngser/behaviotypes/blob/master/doidt.r
- Reed-Sololmon error correction
- Fast exact calculation of p-values in Friedman rank sum test: http://www.ru.nl/publish/pages/726696/friedmanrsd.zip
- The Friedman test is for testing whether any columns are consistently different from other columns in a matrix
- Automated generation of ML pipelines: https://github.com/rhiever/tpot/tree/tpot-mdr
- Forecasting from time-series data: https://facebookincubator.github.io/prophet/ (this is a retail-centric model from Facebook, but could be adapted to biological data)
- Visualizing ML features: https://github.com/pair-code/facets
- Topological analysis:
- Feature selection workflows: https://github.com/enriquea/feseR
- http://www.clips.ua.ac.be/pages/pattern
- Linear mixed-model solver https://github.com/nickFurlotte/pylmm
- Launching a subprocess in a pseudo-terminal (e.g. for accepting passwords) https://github.com/pexpect/ptyprocess
- Launch an editor from python: https://github.com/fmoo/python-editor
- Faster alternative to pyvcf: https://github.com/brentp/cyvcf2
- Reading
- Platforms
- Libraries
- http://www.teglor.com/b/deep-learning-libraries-language-cm569/
- https://github.com/fchollet/keras
- Chainer: https://www.oreilly.com/learning/complex-neural-networks-made-easy-by-chainer
- biologicaly-focused neural networks https://github.com/kundajelab/dragonn/tree/master/dragonn
- analysis of features in deep neural networks https://github.com/kundajelab/deeplift
- API to add fuzzy logic: https://fuzzy.ai/docs
- Edward: probabalistic modeling, inference, and criticism; build on TensorFlow https://github.com/blei-lab/edward
- VectorFlow: specifically designed for sparse data https://github.com/Netflix/vectorflow
- Architectures:
- http://www.asimovinstitute.org/neural-network-zoo/
- Deep residual:
- Wide residual: https://arxiv.org/abs/1605.07146
- Time-delay http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=809100&tag=1
- Semi-supervised classification of nodes in a graph: https://github.com/tkipf/gcn
- Adversarial: https://arxiv.org/abs/1406.2661
- Recurrent scalable deep kernels: https://arxiv.org/abs/1610.08936
- Multiplicative LSTM: https://arxiv.org/abs/1609.07959
- Graph CNN: https://arxiv.org/abs/1609.02907
- Encoding variable-length sequences:
- Interactive sequence generation from RNNs: https://arxiv.org/abs/1612.04687
- DNNs with external memory: http://www.nature.com/nature/journal/v538/n7626/full/nature20101.html
- The Predictron: https://arxiv.org/abs/1612.08810
- Associative LSTM: https://arxiv.org/abs/1602.03032
- Conditional variational autoencoders: http://ijdykeman.github.io/ml/2016/12/21/cvae.html
- Group equivariant CNN: http://jmlr.org/proceedings/papers/v48/cohenc16.pdf
- QuickNet: https://arxiv.org/abs/1701.02291
- Tools
- Non-bio networks that might be applied
- https://github.com/david-gpu/srez
- Deep learning with text: https://explosion.ai/blog/deep-learning-formula-nlp
- Automatic text summarization: https://pypi.python.org/pypi/sumy
- https://github.com/facebookresearch/fastText
- Word2vec: models for predicting missing words https://en.m.wikipedia.org/wiki/Word2vec
- Breve is a mac application that displays large tables in a way that makes it easy to identify patterns and missing data http://breve.designhumanities.org/
- Types of plots:
- Making colorblind-friendly figures: http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/
- http://www.informationisbeautifulawards.com/showcase?acategory=free-tool&action=index&award=2015&controller=showcase&page=1&pcategory=long-list&type=awards
- Examples: http://www.visualcomplexity.com/vc/
- Feedback: http://helpmeviz.com/
- https://bitbucket.org/vda-lab/
- Visualization of GO results: http://cran.r-project.org/web/packages/GOplot/vignettes/GOplot_vignette.html
- Fluff: publication-quality genomics plots http://fluff.readthedocs.org
- Visualization of feature density along the genome: https://github.com/sguizard/DensityMap
- Circos-like visualization of chromosome structure with support for multiple data types https://rondo.ws
- Comparison of different types of heatmaps: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1442-6
- Interactively build simulations and animated interaction diagrams: http://ncase.me/loopy/
- Plotly: Charting library with multiple language bindings https://github.com/plotly/plotly.py
- Open-access visualization research: http://oavis.steveharoz.com/
- Human-like Orthogonal Layout: https://github.com/skieffer/hola
- http://gephi.org/
- scripts from Holt lab: https://github.com/katholt/plotTree
- web-based http://microreact.org/showcase/
- https://github.com/allendecid/TreeLink
- JavaScript libraries: https://github.com/tntvis
- Comparison of libraries:
- D3 from R: http://christophergandrud.github.io/networkD3/
- SVG device: https://github.com/hadley/svglite/blob/master/README.md
- Multilayer data plotted on a Hilbert curve: http://www.bioconductor.org/packages/devel/bioc/html/HilbertCurve.html
- Visualize local epigenetic neighborhood of a SNP: http://bioconductor.org/packages/release/bioc/html/SNPhood.html
- CIRCOS plots: https://cggl.horticulture.wisc.edu/software/
- Nice looking boxplots: https://github.com/mw55309/perceptions
- Make multiplanel figures from a combination of plot types: https://cran.r-project.org/web/packages/multipanelfigure/
- Gallery of extensions: http://www.ggplot2-exts.org/gallery/
- Themes
- Cowplot - improve default ggplot: http://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html
- ggplot2 theme for publication-quality figures: https://github.com/robertwilson190/ggplot2-theme
- https://github.com/hrbrmstr/hrbrthemes
- xkcd-style plots: http://xkcd.r-forge.r-project.org/
- Color palattes
- https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html
- Color scales with clustering (would want to adapt this to ggplot): https://github.com/schne
- https://ggsci.net/
- ggtree: phylogenetic trees https://bioconductor.org/packages/release/bioc/html/ggtree.html
- geomnet: network visualization
- ggrepel: displaying text labels with minimal overlapping https://github.com/slowkow/ggrepelrd/d3-scale-cluster
- ggforce: many extensions to ggplot
- ggalt: many extensions to ggplot
- ggraph: plotting graphs/networks
- ggedit: interactive plot editor (Shiny gadget)
- ggRandomForests https://cran.r-project.org/web/packages/ggRandomForests/index.html
- superheat: pretty heatmaps https://github.com/rlbarter/superheat
- biplots https://github.com/vqv/ggbiplot
- logos: https://github.com/omarwagih/ggseqlogo
- Make any ggplot interactive:
- https://github.com/eclarke/ggbeeswarm: plot overlapping points without jitter
- ggtern: Ternary diagrams https://bitbucket.org/nicholasehamilton/ggtern
- ggpubr: Publication-ready plots, including
- barplot alternatives: http://www.sthda.com/english/rpkgs/ggpubr/
- ggarrange, for flexible multi-panel figures
- joyplots: https://github.com/clauswilke/ggjoy
- colorbindr: test effect of colorblindness on readability of plots https://github.com/clauswilke/colorblindr
- Marginal plots: https://twitter.com/ClausWilke/status/900776341494276096
- Correlograms: corrgram package
- DiagrammR http://rich-iannone.github.io/DiagrammeR/
- Gene word clouds: http://genomespot.blogspot.co.uk/2014/10/geneclouds-unconventional-genetics-data.html?m=1
- Upset plots: https://cran.r-project.org/web/packages/UpSetR/index.html
- Genomic data: https://bioconductor.org/packages/GenVisR
- hextri: multiclass hexagonal bins https://cran.r-project.org/web/packages/hextri/vignettes/hexbin-classes.html
- trellis: https://www.bioconductor.org/packages/release/bioc/html/gtrellis.html
- Complex heat maps: http://www.bioconductor.org/packages/devel/bioc/html/ComplexHeatmap.html
- Scatterplot Matrix: http://bl.ocks.org/mbostock/4063663
- Beeswarm: https://flowingdata.com/2016/09/08/beeswarm-plot-in-r-to-show-distributions/
- http://flowingdata.com/2016/10/25/r-graph-gallery/
- Tilegrams: http://flowingdata.com/2016/10/13/tilegrams-in-r/
- Karyotype plots: http://bioconductor.org/packages/devel/bioc/html/karyoploteR.html
- Joyplots (for multi-sample/category time-series data):
- https://github.com/halhen/viz-pub/blob/master/sports-time-of-day/2_gen_chart.R (TODO: make a dedicated geom for this)
- In python: mwaskom/seaborn#1238
- Shushi: publication-quality figures from multiple data types https://github.com/dphansti/Sushi
- EpiViz: visualization of epigenomic data sets in R, http://epiviz.github.io/
- Interaction data: https://github.com/kcakdemir/HiCPlotter
- Network visualization from R using vis.js: http://dataknowledge.github.io/visNetwork/
- Differential expression from RNA-seq: http://bioconductor.org/packages/devel/bioc/html/Glimma.html
- https://gist.github.com/jcheng5/cbcc3b439a949deb544b
- Interactive charts: https://benjaminlmoore.wordpress.com/2015/05/19/interactive-charts-in-r/
- http://www.htmlwidgets.org/showcase_leaflet.html
- Interactive ROC plots: https://github.com/sachsmc/plotROC
- Dull: create interactive web applications - https://github.com/nteetor/dull
- Graphic of python libraries for visualization: https://pbs.twimg.com/media/C9QBxNsU0AEokR_.jpg
- Seaborn: http://web.stanford.edu/~mwaskom/software/seaborn/index.html
- Bokeh: http://bokeh.pydata.org/docs/user_guide/charts.html
- Vincent: https://github.com/wrobstory/vincent
- Pyxley (Shiny for python): http://multithreaded.stitchfix.com/blog/2015/07/16/pyxley/
- https://github.com/svaksha/pythonidae/blob/master/Computer-Graphics.md
- XKCD-style plots: http://jakevdp.github.io/blog/2012/10/07/xkcd-style-plots-in-matplotlib/?utm_content=buffera9a76&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
- Phylogenetic trees: http://etetoolkit.org/
- Altair: https://altair-viz.github.io/
- Visualization dashboard: https://github.com/facebookresearch/visdom
- http://holoviews.org/
- GGplot clone: https://github.com/has2k1/plotnine
- Dash: for building data-centric web apps https://github.com/plotly/dash
- vis.js
- Javascript libraries
- http://blog.webkid.io/javascript-chart-libraries/
- D3.js: https://pub.beakernotebook.com/#/publications/560c9f9b-14e6-4d95-8e78-cc0a60bf4e5a?fullscreen=false
- Scraping/JS rendering: https://splash.readthedocs.org/en/latest/
- Circos for Javascript: http://bioinfo.ibp.ac.cn/biocircos/
- VegaLite: ggplot-like framework built on top of Vega, which is built on D3.js: https://vega.github.io/vega-lite/
- GPU rendering: https://stardustjs.github.io
- Dataviz components built on top of D3: http://nivo.rocks/?ref=producthunt#/components
- https://emeeks.github.io/semiotic/#/
- Cool interactive visualization of differential data: http://graphics.wsj.com/gender-pay-gap/
- Licenses: http://choosealicense.com/licenses/
- APIs for literature search: http://libguides.mit.edu/apis
- Assessing credit for bioinformatics software authorship: http://depsy.org/
- Icons for presentations: http://cameronneylon.net/blog/some-slides-for-granting-permissions-or-not-in-presentations/
- Continuous analysis: https://github.com/greenelab/continuous_analysis
- A nice template for bootstrapping your own academic website using GitHub: https://academicpages.github.io/
- Desktop app for searching/managing papers: https://github.com/codeforscience/sciencefair
- Recommend papers to cite based on your bibliography: http://labs.semanticscholar.org/citeomatic/
- Word choices:
- Prepare papers for any journal format: https://typeset.io/
- Slideboards: Mashup of slides and FAQ to explain a publication http://slideboard.herokuapp.com/
- Two-column rmarkdown template: http://dirk.eddelbuettel.com/code/pinp.html
- Generate manuscripts on GitHub: https://github.com/greenelab/manubot-rootstock
- DOI for code
- http://zenodo.org/
- https://guides.github.com/activities/citable-code/
- https://mozillascience.github.io/code-research-object/
- Ruby library for fetching metadata for DOI: https://rubygems.org/gems/terrier
- CodeOcean: web platform to run algorithms https://codeocean.com
- GitHub: https://github.com/blog/1986-announcing-git-large-file-storage-lfs
- Amazon CodeCommit: http://aws.amazon.com/codecommit/
- Data sharing
- Dat: https://datproject.org/
- http://academictorrents.com/
- https://figshare.com/
- Patterns for data sharing: http://project-if.github.io/data-permissions-catalogue/
- Globus is an open source toolkit for transferring large data files; it implements the GridFTP protocol https://www.globus.org/
- bbcp is multi-stream scp (for point-to-point large file transfer) https://www.olcf.ornl.gov/kb_articles/transferring-data-with-bbcp/
- Quilt: package manager for data https://quiltdata.com/
- Git plugin for version-control of data files: https://github.com/ctjacobs/git-rdm
- OSF API: https://test-api.osf.io/v2/docs/
- Data project management: https://www.datazar.com
- Execute code from GitHub project with Jupyter notebooks: http://mybinder.org/
- http://www.data-retriever.org/
- https://thinklab.com/
- http://www.researchobject.org/
- Distill: Online, ML-focused journal http://distill.pub/journal/
- JORS: https://openresearchsoftware.metajnl.com/
- JOSS: http://joss.theoj.org/
- Scripts to identify "bad smells" in science writing (would want to convert this to python): http://matt.might.net/articles/shell-scripts-for-passive-voice-weasel-words-duplicates/
- Collaborative writing
- Templates:
- InDesign template for preprint: https://github.com/cleterrier/ManuscriptTools/blob/master/biorxiv_template_CC2015.indd
- Rmarkdown templates for journal articles https://github.com/rstudio/rticles
- GitHub template for authoring papers: https://github.com/peerj/paper-now
- Convert between (R)markdown and iPython notebooks: https://github.com/aaren/notedown
- Pandoc scholar: https://github.com/pandoc-scholar/pandoc-scholar
- Nice paper showing example of generating manuscripts for two different journals: https://peerj.com/preprints/2648.pdf
- Convert DOI to Bibtex entry
- Online equation editor: https://www.mathcha.io/
- Simplified markup language for equations: http://asciimath.org/
- https://editoria.pub/
- SDR gene set analysis http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0928-6
- ABBA http://abba.systems-genetics.net
- EpiTensor (3D genomes from 1D data): http://www.nature.com.ezproxy.nihlibrary.nih.gov/ncomms/2016/160310/ncomms10812/full/ncomms10812.html
- MOCHA: identifying modulators of transcriptional regulation from gene expression http://www.nature.com.ezproxy.nihlibrary.nih.gov/articles/srep22656
- HiC deconvolution: http://www.pnas.org.ezproxy.nihlibrary.nih.gov/content/early/2016/03/04/1512577113.full
- MR_eQTL: https://github.com/PrincetonUniversity/MR_eQTL
- Method for multi-omics integration: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1122-6
- Dynamic Bayesian network for predicting TF binding from DNase-seq: http://bioinformatics.oxfordjournals.org/content/26/12/i334.long
- De-noising ChIP-seq using DNNs: http://biorxiv.org/content/biorxiv/early/2016/05/07/052118.full.pdf
- DNA clustering: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-321
- Bayesian identification of bifurcations in single-cell data: http://biorxiv.org/content/biorxiv/early/2016/09/21/076547.full.pdf
- Prediction of promoter-enhancer interactions: http://biorxiv.org/content/biorxiv/early/2016/11/02/085241.full.pdf
- Mocap: TFBS prediction from ATAC-seq http://biorxiv.org/content/biorxiv/early/2016/10/27/083998.full.pdf
- Statistical method to determine deviation from time linearity in epigenetic aging: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005183
- Multi-tissue eQTL meta-analysis: http://biorxiv.org/content/early/2017/01/16/100701
- Missing value imputation evaluator: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1429-3
- Simulated annealing for biological network alignment: https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx090/2996219/SANA-Simulated-Annealing-far-outperforms-many
- TF binding site prediction using memory-matching networks: https://arxiv.org/pdf/1702.06760.pdf
- V-ALIGN: alignmnet on genome graphs http://www.biorxiv.org/content/biorxiv/early/2017/04/06/124941.full.pdf
- Self-organizing maps for single-cell http://www.biorxiv.org/content/biorxiv/early/2017/04/05/124693.full.pdf
- New locality sensitive hashing method: http://www.biorxiv.org/content/biorxiv/early/2017/08/25/180471.full.pdf