💡 Artificial intelligence has been widely applied in drug discovery over the past decade and is still gaining popularity. This repository compiles a collection works on related areas, based on the manuscript Artificial Intelligence in Drug Discovery: Applications and Techniques by Jianyuan Deng et al. The preprint version is available in ResearchGate. Hope you will find it useful for your research (citation is provided below).
🔔 This repository is updated regularly.
@article{deng2022artificial,
title={Artificial intelligence in drug discovery: applications and techniques},
author={Deng, Jianyuan and Yang, Zhibo and Ojima, Iwao and Samaras, Dimitris and Wang, Fusheng},
journal={Briefings in Bioinformatics},
volume={23},
number={1},
pages={bbab430},
year={2022},
publisher={Oxford University Press}
}
1.1 General Drug Discovery
- Integration of virtual and high-throughput screening (Nat Rev Drug Discov 2002) [Paper]
- Chemical space and biology (Nature 2004) [Paper]
- Computer-based de novo design of drug-like molecules (Nat Rev Drug Discov 2005) [Paper]
- On Outliers and Activity Cliffs-Why QSAR Often Disappoints (J Chem Inf Model 2006) [Paper]
- Evaluating Virtual Screening Methods: Good and Bad Metrics for the “Early Recognition” Problem (J Chem Inf Model 2007) [Paper]
- Virtual screening: an endless staircase? (Nat Rev Drug Discov 2010) [Paper]
- Privileged Scaffolds for Library Design and Drug Discovery (Curr Opin Chem Biol 2010) [Paper]
- Principles of early drug discovery (Br J Pharmacol 2011) [Paper]
- Recognizing Pitfalls in Virtual Screening: A Critical Review (J Chem Inf Model 2012) [Paper]
- Multi-objective optimization methods in drug design (Drug Discov Today 2013) [Paper]
- Finding the rules for successful drug optimisation (Drug Discov Today 2014) [Paper]
- Recent Progress in Understanding Activity Cliffs and Their Utility in Medicinal Chemistry (J Med Chem 2014) [Paper]
- Automating Drug Discovery (Nat Rev Drug Discov 2017) [Paper]
- Interpretation of Quantitative Structure−Activity Relationship Models: Past, Present, and Future (J Chem Inf Model 2017) [Paper]
- Advances and Challenges in Computational Target Prediction (J Chem Inf Model 2019) [Paper]
- Duality of activity cliffs in drug discovery (Expert Opin Drug Discov 2019) [Paper]
- QSAR without borders (Chem Soc Rev 2020) [Paper]
- Designing small molecules for therapeutic success: A contemporary perspective (Drug Discov Today 2021) [Paper]
- Phenotypic drug discovery: recent successes, lessons learned and new directions (Nat Rev Drug Discov 2022) [Paper]
- Is the reductionist paradox an Achilles Heel of drug discovery? (J Comput Aided Mol 2022) [Paper]
- Machine-learning approaches in drug discovery: methods and applications (Drug Discov Today 2015) [Paper]
- The rise of deep learning in drug discovery (Drug Discov Today 2018) [Paper]
- Applications of machine learning in drug discovery and development (Nat Rev Drug Discov 2019)[Paper]
- Deep Learning in Chemistry (J Chem Inf Model 2019) [Paper]
- Deep learning for molecular design—a review of the state of the art (Mol Syst Des Eng 2019) [Paper]
- Efficient molecular encoders for virtual screening (Drug Discov Today Technol 2019) [Paper]
- Artificial intelligence in chemistry and drug design (J Comput Aid Mol Des 2020) [Paper]
- Graph convolutional networks for computational drug development and discovery (Brief Bioinformatics 2020) [Paper]
- Transfer Learning for Drug Discovery (J Med Chem 2020) [Paper]
- Learning Molecular Representations for Medicinal Chemistry (J Med Chem 2020) [Paper]
- Exploring chemical space using natural language processing methodologies for drug discovery (Drug Discov Today 2020) [Paper]
- Practical Notes on Building Molecular Graph Generative Models (Applied AI Letters 2020) [Paper]
- A compact review of molecular property prediction with graph neural networks (Drug Discov Today 2020) [Paper]
- Artificial intelligence in drug discovery: Recent advances and future perspectives (Expert Opin Drug Discov 2021) [Paper]
- Artificial intelligence in drug discovery and development (Drug Discov Today 2021) [Paper]
- Graph neural networks for automated de novo drug design (Drug Discov Today 2021) [Paper]
- De novo molecular design and generative models (Drug Discov Today 2021) [Paper]
- Artificial Intelligence for Drug Discovery (KDD 2021) [Paper] [Website] [TorchDrug]
- Generative Deep Learning for Targeted Compound Design (J Chem Inf Model 2021) [Paper]
- Explainable Machine Learning for Property Predictions in Compound Optimization (J Med Chem 2021) [Paper]
- A decade of machine learning-based predictive models for human pharmacokinetics: Advances and challenges (Drug Discov Today 2021) [Paper]
- Defining Levels of Automated Chemical Design (J Med Chem 2022) [Paper]
- Evaluation guidelines for machine learning tools in the chemical sciences (Nat Rev Chem 2022) [Paper]
- Combining DELs and machine learning for toxicology prediction (Drug Discov Today 2022) [Paper]
Side Notes: Successful Applications
- Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Insilico Medicine)
- A Deep Learning Approach to Antibiotic Discovery (Cell 2020) [Paper] [Code] (MIT CSAIL)
- "BenevolentAI Announces First Patient Dosed In Its Atopic Dermatitis Clinical Trial" [Link] (BenevolentAI)
- "Exscientia Announces First AI-Designed Immuno-Oncology Drug to Enter Clinical Trials" [Link] (Exscientia)
- "Breaking Big Pharma's AI barrier: Insilico Medicine uncovers novel target, new drug for pulmonary fibrosis in 18 months" [Link] (Insilico Medicine)
- Rethinking drug design in the artificial intelligence era (Nat Rev Drug Discov 2020) [Paper]
- Towards reproducible computational drug discovery (J Cheminf 2020) [Paper]
- Current Trends, Overlooked Issues, and Unmet Challenges in Virtual Screening (J Chem Inf Model 2020) [Paper]
- Drug discovery with explainable artificial intelligence (Nat Mach Intell 2020) [Paper]
- Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet (Drug Discov Today 2021) [Paper]
- Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data used for AI in drug discovery (Drug Discov Today 2021) [Paper]
- Critical assessment of AI in drug discovery (Expert Opin Drug Discov 2021) [Paper]
- An Insight into Artificial Intelligence in Drug Discovery: An Interview with Professor Gisbert Schneider (Expert Opin Drug Discov 2021) [Paper]
2.1 Large-Scale Databases
PubChem- PubChem in 2021: new data content and improved web interfaces (Nucleic Acids Res 2021) [Paper] [Website] [Download]
- The ChEMBL database in 2017 (Nucleic Acids Res 2017) [Paper] [Website] [Download] [WebAPI]
- Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases (Brief Bioinformatics 2019) [Paper]
- DrugBank --- DrugBank 5.0: a major update to the DrugBank database for 2018 (Nucleic Acids Res 2018) [Paper] [Website] [Download]
- KEGG --- KEGG as a reference resource for gene and protein annotation (Nucleic Acids Res 2016) [Paper] [Website] [Download]
- PDBbind --- PDB-wide collection of binding data: current status of the PDBbind database (Bioinformatics 2015) [Paper] [Website] [Download]
- BindingDB --- BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology (Nucleic Acids Res 2016) [Paper] [Website] [Download]
- DUD --- Benchmarking Sets for Molecular Docking (J Med Chem 2006) [Paper] [Website]
- DUD-E --- Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking (J Med Chem 2012) [Paper] [Website]
- MUV --- Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data (J Chem Inf Model 2009) [Paper] [Website]
- STITCH --- STITCH: interaction networks of chemicals and proteins (Nucleic Acids Res 2008) [Paper] [Website]
- GLL&GDD --- Ligand and Decoy Sets for Docking to G Protein-Coupled Receptors (J Chem Inf Model 2012) [Paper] [Website]
- NRLiSt BDB --- NRLiSt BDB, the Manually Curated Nuclear Receptors Ligands and Structures Benchmarking Database (J Med Chem 2014) [Paper] [Website]
- SIDER --- The SIDER database of drugs and side effects (Nucleic Acids Res 2016) [Paper] [Website]
- Offsides&Twosides --- Data-driven prediction of drug effects and interactions (Sci Transl Med 2012) [Paper] [Website]
- DILIrank --- DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans (Drug Discov Today 2016) [Paper] [Website]
- UniProt --- UniProt: the universal protein knowledgebase in 2021 (Nucleic Acids Res 2021) [Paper] [Website]
- PDB --- The Protein Data Bank (Nucleic Acids Res 2000) [Paper] [Website]
2.2 Small Molecule Representations
- Molecular representations in AI‑driven drug discovery: a review and practical guide (J Cheminf 2020) [Paper]
2.3 Benchmark Platforms
MoleculeNet- MoleculeNet: a benchmark for molecular machine learning (Chem Sci 2018) [Paper] [Code] [Download]
- Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations (Nat Mach Intell 2021) [Paper] [Code]
- Analyzing Learned Molecular Representations for Property Prediction (J Chem Inf Model 2019) [Paper] [Code] [Website]
- Molecular De Novo design using Recurrent Neural Networks and Reinforcement Learning (J Cheminf 2017) [Paper] [Code]
- REINVENT 2.0 – an AI Tool for De Novo Drug Design (J Chem Inf Model 2020) [Paper] [Code]
- Graph Networks for Molecular Design (aka: GraphINVENT; Mach Learn: Sci Technol 2021) [Paper] [Code]
- Practical Notes on Building Molecular Graph Generative Models (Applied AI Letters 2020) [Paper] [Code]
- Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models (Front Pharmacol 2020) [Paper] [Code]
3.1 Convolutional Neural Networks
Task*: Molecular Property Prediction; Representation*: Images
- Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction (J Chem Inf Model 2017) [Paper] (Techs - CNN + SVM)
- Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models (aka: Chemception; arXiv 2017) [Paper]
- Toxic Colors: The Use of Deep Learning for Predicting Toxicity of Compounds Merely from Their Graphic Images (aka: Toxic Colors; J Chem Inf Model 2018) [Paper]
- KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images (J Cheminf 2019) [Paper] [Code] (Techs - CNN)
- Learning Drug Functions from Chemical Structures with Convolutional Neural Networks and Random Forests (J Chem Inf Model 2019) [Paper] (Techs - CNN)
- DEEPScreen: high performance drug–target interaction prediction with convolutional neural networks using 2-D structural compound representation (Chem Sci 2020) [Paper] [Code]
Task*: Molecular Property Prediction; Representation*: Fingerprints
- Massively Multitask Networks for Drug Discovery (arXiv 2015) [Paper]
- Convolutional Networks on Graphs for Learning Molecular Fingerprints (NeurIPS 2015) [Paper] [Code]
Side Note: Molecular Structure Extraction and Recognition
- Molecular Structure Extraction from Documents Using Deep Learning (J Chem Inf Model 2019) [Paper]
- DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature (J Cheminf 2021) [Paper]
- DECIMER: towards deep learning for chemical image recognition (J Cheminf 2020) [Paper] [Code]
- DECIMER 1.0: Deep Learning for Chemical Image Recognition using Transformers (chemRxiv 2021) [Paper]
- Img2Mol - Accurate SMILES Recognition from Molecular Graphical Depictions (Chem Sci 2021) [Paper] [Code]
3.2 Recurrent Neural Networks
Task*: Molecular Property Prediction; Representation*: SMILES Strings
- SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties (aka: SMILES2Vec; arXiv 2017) [Paper]
- Large-scale comparison of machine learning methods for drug target prediction on ChEMBL (aka:SmilesLSTM; Chem Sci 2018) [Paper] [Code] (Techs - RNN + GNN + Multi-Task Learning)
Task*: Molecule Generation; Representation*: SMILES Strings
- Molecular de‑novo design through deep reinforcement learning (aka: REINVENT; J Cheminf 2017) [Paper] (Techs - RNN: GRU + RL: Policy-gradient REINFORCE)
- Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks (aka: CharRNN; ACS Cent Sci 2018) [Paper] (Techs - Transfer Learning)
- Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design (ICLR 2018 Workshop) [Paper] (Techs - RL: Hybrid A2C, Policy-gradient PPO)
- Deep Reinforcement Learning for de novo Drug Design(aka: ReLeaSE; Sci Adv 2018)[Paper] [Code] (Techs - RL: Policy-gradient REINFORCE)
- Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design (J Chem Inf Model 2019) [Paper] [Code] (Techs - RNN: BiLSTM + RL: Hybrid Actor-Critic)
- Scaffold-Constrained Molecular Generation (J Chem Inf Model 2020) [Paper] (Techs - RL: Policy-based Hill Climbing)
Task*: Molecule Generation; Representation*: Molecular Graphs
- GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models (aka: GraphRNN; ICML 2018) [Paper] [Code]
- Learning Deep Generative Models of Graphs (ICML 2018) [Paper] [Code] (Techs - RNN: LSTM)
- MolecularRNN: Generating realistic molecular graphs with optimized properties (arXiv 2019) [Paper]
- A Deep-Learning View of Chemical Space Designed to Facilitate Drug Discovery (aka: DESMILES; J Chem Inf Model 2020) [Paper]
3.3 Graph Neural Networks
Task*: Molecular Property Prediction; Representation*: Molecular Graphs
- Molecular Graph Convolutions: Moving Beyond Fingerprints (aka: Weave; J Comput Aided Mol Des 2016) [Paper]
- Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction (J Chem Inf Model 2017) [Paper]
- Semi-supervised classification with graph convolutional networks (aka: GraphConv; ICLR 2017) [Paper] [Code]
- Neural Message Passing for Quantum Chemistry (aka: MPNN; ICML 2017) [Paper] [Code]
- SchNet: A continuous-filter convolutional neural network for modeling quantum interactions (aka: SchNet; NeurIPS 2017)[Paper] [Code]
- Low Data Drug Discovery with One-Shot Learning (ACS Cent Sci 2017) [Paper] (Techs - LSTM: BiLSTM, attLSTM + GNN + Few-Shot Learning)
- Large-scale comparison of machine learning methods for drug target prediction on ChEMBL (aka:SmilesLSTM; Chem Sci 2018) [Paper] [Code] (Techs - RNN + GNN + Multi-Task Learning)
- PotentialNet for Molecular Property Prediction (aka: PotentialNet; ACS Cent Sci 2018) [Paper] (Techs - GNN: GCNN + Multi-Task Learning)
- Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective (aka: MGCN; AAAI 2019) [Paper]
- Deep Learning-Based Prediction of Drug-Induced Cardiotoxicity (J Chem Inf Model 2019) [Paper] [Code] (Techs - GCN + Multi-task Learning)
- DeepChemStable: Chemical Stability Prediction with an Attention-Based Graph Convolution Network (J Chem Inf Model 2019) [Paper] (Techs - GCN + Attention)
- Analyzing Learned Molecular Representations for Property Prediction (aka: Chemrop, D-MPNN; J Chem Inf Model 2019) [Paper] [Code]
- Molecule Property Prediction Based on Spatial Graph Embedding (aka: C-SGEN; J Chem Inf Model 2019) [Paper] [Code]
- Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism (aka: Attentive FP; J Med Chem 2019) [Paper] [Code]
- Graph convolutional neural networks as” general-purpose” property predictors: the universality and limits of applicability (J Chem Inf Model 2020) [Paper]
- N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules (aka: N-Gram Graph; NeurIPS 2019) [Paper]
- Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction (J Cheminf 2020) [Paper] [Code] (Techs - MPNN + Multi-Task Learning)
- A self‑attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility (J Cheminf 2020) [Paper] [Code] (Techs - MPNN + Self-Attention: Interpretability)
- Chemically Interpretable Graph Interaction Network for Prediction of Pharmacokinetic Properties of Drug-Like Molecules (aka: CIGIN; AAAI 2020) [Paper] [Code]
- Strategies for Pre-training Graph Neural Networks (ICLR 2020) [Paper] [Code] (Techs - Self-Supervised Learning)
- Directional Message Passing for Molecular Graphs (aka: DimeNet; ICLR 2020) [Paper] [Code]
- Drug–target affinity prediction using graph neural network and contact maps (RSC Advances 2020) [Paper] (Techs - GCN + GAT)
- ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction (aka: ASGN; KDD 2020) [Paper] [Code] (Techs - Active Learning)
- Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction (ICML 2020 Workshop) [Paper] [Code] (Techs - GGNN + Meta Learning: MAML, FO-MAML, ANIL)
Task*: Molecule Generation; Representation*: Molecular Graphs
- Multi‑objective de novo drug design with conditional graph generative model (J Cheminf 2018) [Paper] [Code] (Techs - Conditional Graph Generative Model: MolMP, MolRNN)
- Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (aka: GCPN; NeurIPS 2018) [Paper] [Code] (Techs - GCN + RL: PPO)
- Optimization of Molecules via Deep Reinforcement Learning (aka: MolDQN; Sci Rep 2019) [Paper] (Techs - RL: Q-learning)
- Improving Molecular Design by Stochastic Iterative Target Augmentation (ICML 2020) [Paper] [Code] (Techs - VSeq2Seq/HierGNN + Semi-Supervised Learnning)
- DeepGraphMolGen, a multi‑objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach (J Cheminf 2020) [Paper] (Techs - GCN + RL: PPO)
- Reinforced Molecular Optimization with Neighborhood-Controlled Grammars (aka: MNCE-RL; NeurIPS 2020) [Paper] [Code] (Techs - RL: PPO)
- Graph Networks for Molecular Design (aka: GraphINVENT; Mach Learn: Sci Technol 2021) [Paper] [Code]
- De novo drug design using reinforcement learning with graph-based deep generative models (aka: RL-GraphINVENT; ChemRxiv 2021) [Paper] [Code]
Side Note: Common GNN Models
- Recurrent GNNs Gated graph sequence neural networks (aka: GGNN; ICLR 2016) [Paper] [Code]
- Convolutional GNNs (Spectral-based) Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (aka: ChebNet; NeurIPS 2016) [Paper] [Code]
- Convolutional GNNs (Spectral-based) Semi-supervised classification with graph convolutional networks (aka: GraphConv; ICLR 2017) [Paper] [Code]
- Convolutional GNNs (Spatial-based) Neural message passing for quantum chemistry (aka: MPNN; ICML 2017) [Paper] [Code]
- Convolutional GNNs (Spatial-based) Inductive Representation Learning on Large Graphs (aka: GraphSAGE; NeurIPS 2017) [Paper] [Code]
- Convolutional GNNs (Spatial-based) Graph Attention Networks (aka: GAT; ICLR 2018) [Paper] [Code]
- Convolutional GNNs (Spatial-based) How powerful are graph neural networks? (aka: GIN; ICLR 2019) [Paper] [Code]
3.4 Variational Autoencoders
Task*: Molecule Generation; Representation*: SMILES Strings
- Automatic chemical design using a data-driven continuous representation of molecules (arXiv 2016; ACS Cent Sci 2018) [Paper] [Code] (Techs - VAE)
- Grammar Variational Autoencoder (aka: GrammarVAE; ICML 2017) [Paper]
- Application of Generative Autoencoder in De Novo Molecular Design (Mol Inform 2017) [Paper]
- Syntax-Directed Variational Autoencoder for Structured Data (aka: SD-VAE; ICLR 2018) [Paper] [Code]
- Conditional Molecular Design with Deep Generative Models (aka: Continuous SSVAE; J Chem Inf Model 2018)[Paper] [Code]
- Molecular generative model based on conditional variational autoencoder for de novo molecular design (aka: CVAE; J Cheminf 2018) [Paper] [Code] (Techs - VAE)
- Constrained Graph Variational Autoencoders for Molecule Design (aka: CGVAE; NeurIPS 2018) [Paper] [Code]
- NEVAE: A Deep Generative Model for Molecular Graphs (aka: NeVAE; AAAI 2019) [Paper] [Code]
- De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping (aka: GTMVAE; J Chem Inf Model 2019) [Paper] (Techs - Autoencoder + RNN)
- Re-balancing Variational Autoencoder Loss for Molecule Sequence Generation (aka: re-balanced VAE; ACM BCB 2020) [Paper] [Code] (Techs - RNN: BiGRU + VAE)
- CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models (aka: CogMol; NeurIPS 2020) [Paper] [Code]
VAE Variant: AAE
- Application of Generative Autoencoder in De Novo Molecular Design (Mol Inform 2017) [Paper]
- druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico (aka: druGAN; Mol Pharm 2017) [Paper]
- Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery (aka: SAAE; Mol Pharm 2018) [Paper]
Task*: Molecule Generation; Representation*: Molecular Graphs
- GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders (aka: GraphVAE; arXiv 2018) [Paper]
- Junction Tree Variational Autoencoder for Molecular Graph Generation (aka: JT-VAE; ICML 2018) [Paper] [Code]
- Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders (aka:Regularized VAE; NeurIPS 2018)[Paper]
- Molecular Hypergraph Grammar with Its Application to Molecular Optimization (aka: MHG-VAE; ICML 2019) [Paper] [Code]
- Efficient learning of non‑autoregressive graph variational autoencoders for molecular graph generation (J Cheminf 2019) [Paper] [Code] (Techs - Non-autoregressive VAE + RL)
- Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Techs - VAE + RL: REINFORCE)
- Scaffold-based molecular design using graph generative model (aka: ScaffoldVAE; arXiv 2019) [Paper]
- Learning Multimodal Graph-to-Graph Translation for Molecule Optimization (aka: VJTNN; ICLR 2019) [Paper] [Code]
- CORE: Automatic Molecule Optimization Using Copy & Refine Strategy (AAAI 2020) [Paper] [Code]
- Hierarchical Generation of Molecular Graphs using Structural Motifs (aka: HierVAE; ICML 2020) [Paper] [Code] (Techs - Hierarchical VAE)
- Compressed graph representation for scalable molecular graph generation (J Cheminf 2020) [Paper] [Code] (Techs - Non-autoregressive VAE)
Side Note: Reaction & Retrosynthesis Prediction; Representation*: Molecular Graphs
- Generating Molecules via Chemical Reactions (ICLR 2019 Workshop) [Paper]
- Barking up the right tree: an approach to search over molecule synthesis DAG (NeurIPS 2020) [Paper] [Code]
3.5 Generative Adversarial Networks
Task*: Molecule Generation; Representation*: SMILES Strings
- Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (aka: ORGAN; ArXiv 2017) [Paper] [Code] (Techs - GAN: G-RNN, D-CNN + RL: REINFORCE)
- Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (aka: ORGANIC; ChemRxiv 2017) [Paper] [Code] (Techs - GAN + RL: REINFORCE)
- Reinforced Adversarial Neural Computer for de Novo Molecular Design (aka: RANC; J Chem Inf Model 2018) [Paper] (Techs - GAN + RL)
Task*: Molecule Generation; Representation*: Molecular Graphs
- MolGAN: An implicit generative model for small molecular graphs (aka: MolGAN; ICML 2018 Workshop) [Paper] [Code-Tensorflow] [Code-PyTorch] (Techs - GAN + RL: DDPG)
3.6 Normalizing Flow Models
Task*: Molecule Generation; Representation*: Molecular Graphs
- GraphNVP: An Invertible Flow Model for Generating Molecular Graphs (aka: GraphNVP; arXiv 2019) [Paper] [Code]
- Graph Residual Flow for Molecular Graph Generation (aka: GRF; arXiv 2019) [Paper]
- GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation (aka: GraphAF; ICLR 2020) [Paper] [Code] (Techs - Flow + RL: PPO)
- MoFlow: An Invertible Flow Model for Generating Molecular Graphs (aka: MoFlow; KDD 2020) [Paper] [Code]
- GraphDF: A Discrete Flow Model for Molecular Graph Generation (aka: GraphDF; ICML 2021) [Paper]
- Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation (NeurIPS 2021) [Paper]
3.7 Transformers
Task*: Molecular Property Prediction; Representation*: SMILES Strings
- SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (aka: SMILES-BERT; ACM BCB 2019) [Paper] (Techs - BERT)
- SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery (aka: SMILES Transformer; arXiv 2019) [Paper] [Code]
- ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction (aka: ChemBERTa; arXiv 2020) [Paper] [Code]
- Molecular representation learning with language models and domain-relevant auxiliary tasks (aka: MolBERT; NeurIPS 2020 Workshop) [Paper] [Code] (Techs - BERT + Self-Supervised Learning)
- Algebraic graph-assisted bidirectional transformers for molecular property prediction (aka: AGBT; Nat Commun 2021) [Paper] [Code]
- ChemBERTa-2: Towards Chemical Foundation Models (aka: ChemBERTa-2; arXiv 2022) [Paper]
Task*: Molecular Property Prediction; Representation*: Molecular Graphs
- Self-Supervised Graph Transformer on Large-Scale Molecular Data (aka: GROVER; NeurIPS 2020) [Paper] [Code] (Techs - Graph Transformer + Self-Supervised Learning)
Task*: Molecule Generation; Representation*: SMILES Strings
- Transformer-Based Generative Model Accelerating the Development of Novel BRAF Inhibitors (ACS Omega 2021) [Paper]
- MolGPT: Molecular Generation Using a Transformer-Decoder Model (aka: MolGPT; J Chem Inf Model 2022) [Paper] [Code]
Task*: Molecule Generation; Representation*: Molecular Graphs
- A Model to Search for Synthesizable Molecules (aka: Molecule Chef; NeurIPS 2019) [Paper] [Code]
- Transformer neural network for protein-specific de novo drug generation as a machine translation problem (Sci Rep 2021) [Paper]
4.1 Self-Supervised Learning in Molecular Property Prediction
Generative Learning
- Strategies for Pre-training Graph Neural Networks (ICLR 2020) [Paper] [Code] (Techs - Self-Supervised Learning)
- Molecular representation learning with language models and domain-relevant auxiliary tasks (aka: MolBERT; NeurIPS 2020 Workshop) [Paper] [Code] (Techs - BERT + Self-Supervised Learning)
- Self-Supervised Graph Transformer on Large-Scale Molecular Data (aka: GROVER; NeurIPS 2020) [Paper] [Code] (Techs - Graph Transformer + Self-Supervised Learning)
Contrastive Learning
- MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (ArXiv 2021) [Paper] [Code]
- Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast (J Chem Inf Model 2022) [Paper] [Code]
4.2 Reinforcement Learning in Molecule Generation
- Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (aka: ORGAN; ArXiv 2017) [Paper] [Code] (Techs - GAN: G-RNN, D-CNN + RL: Policy-gradient REINFORCE)
- Molecular de‑novo design through deep reinforcement learning (aka: REINVENT; J Cheminf 2017) [Paper] (Techs - RNN: GRU + RL: Policy-gradient REINFORCE)
- Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (aka: ORGANIC; ChemRxiv 2017) [Paper] [Code] (Techs - GAN + RL: Policy-gradient REINFORCE)
- Reinforced Adversarial Neural Computer for de Novo Molecular Design (aka: RANC; J Chem Inf Model 2018) [Paper] (Techs - GAN + RL)
- Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design (ICLR 2018 Workshop) [Paper] (Techs - RL: Hybrid A2C, Policy-gradient PPO)
- MolGAN: An implicit generative model for small molecular graphs (aka: MolGAN; ICML 2018 Workshop) [Paper] [Code-Tensorflow] [Code-PyTorch] (Techs - GAN + RL: Hybrid Actor-Critic DDPG)
- Deep Reinforcement Learning for de novo Drug Design(aka: ReLeaSE; Sci Adv 2018)[Paper] [Code] (Techs - RL: Policy-gradient REINFORCE)
- Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (aka: GCPN; NeurIPS 2018)[Paper] [Code] (Techs - GCN + RL: Policy-gradient PPO)
- Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Techs - VAE + RL: Policy-gradient REINFORCE)
- Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design (aka: DeepFMPO; J Chem Inf Model 2019) [Paper] [Code] (Techs - RNN: BiLSTM + RL: Hybrid Actor-Critic)
- Optimization of Molecules via Deep Reinforcement Learning (aka: MolDQN; Sci Rep 2019) [Paper] (Techs - RL: Value-based Double Q-learning)
- Efficient learning of non‑autoregressive graph variational autoencoders for molecular graph generation (J Cheminf 2019) [Paper] [Code] (Techs - Non-autoregressive VAE + RL: Policy-gradient)
- GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation (aka: GraphAF; ICLR 2020) [Paper] [Code] (Techs - Flow + RL: Policy-gradient PPO)
- Reinforcement Learning for Molecular Design Guided by Quantum Mechanics (cka: MolGym; ICML 2020) [Paper] (Techs - RL: Policy-gradient PPO)
- DeepGraphMolGen, a multi‑objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach (aka: DeepGraphMolGen; J Cheminf 2020) [Paper] (Techs - GCN + RL: Policy-gradient PPO)
- Reinforced Molecular Optimization with Neighborhood-Controlled Grammars (aka: MNCE-RL; NeurIPS 2020) [Paper] [Code] (Techs - RL: Policy-gradient PPO)
- Deep inverse reinforcement learning for structural evolution of small molecules (Brief Bioinform 2021) [Paper] [Code] (Techs - Inverse RL)
Side Note: Common RL Algorithms
- Value-based Playing Atari with Deep Reinforcement Learning (aka: DQN; NeurIPS Workshop 2013) [Paper]
- Value-based Human-level control through deep reinforcement learning (aka: DQN; Nature 2015) [Paper]
- Value-based Deep Reinforcement Learning with Double Q-learning (aka: Double Q-learning; AAAI 2016) [Paper]
- Value-based Prioritized Experience Replay (aka: DQN with Experience Replay; ICLR 2016) [Paper]
- Value-based Dueling Network Architectures for Deep Reinforcement Learning (aka: Dueling Network; ICML 2016) [Paper]
- Policy-gradient Simple statistical gradient-following algorithms for connectionist reinforcement learning (aka: REINFORCE; Mach Learn 1992) [Paper]
- Policy-gradient Policy Gradient Methods for Reinforcement Learning with Function Approximation (aka: Random Policy Gradient; NeurIPS 1999) [Paper]
- Policy-gradient Deterministic Policy Gradient Algorithms (aka: DPG; ICML 2014) [Paper]
- Policy-gradient Trust Region Policy Optimization (aka: TRPO; ICML 2015) [Paper]
- Policy-gradient Proximal Policy Optimization Algorithms (aka: PPO; arXiv 2017 2015) [Paper]
- Hybrid Continuous control with deep reinforcement learning (aka: DDPG; ICLR 2016) [Paper]
- Hybrid Asynchronous Methods for Deep Reinforcement Learning (aka: A3C; ICML 2016) [Paper]
Side Note: Pareto Optimality
- De Novo Drug Design of Targeted Chemical Libraries Based on Artificial Intelligence and Pair-Based Multiobjective Optimization (J Chem Inf Model 2020) [Paper] [Code] (Techs - Pareto Optimality)
- Multiobjective de novo drug design with recurrent neural networks and nondominated sorting (J Cheminf 2020) [Paper] (Techs - Pareto Optimality)
- DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology (ChemRxiv) [Paper] (Techs - Pareto Optimality)
Side Note: Reaction & Retrosynthesis Optimization
- Optimizing chemical reactions with deep reinforcement learning (ACS Cent Sci 2017) [Paper]
4.4 Other Learning Paradigms
Metric Learning
- Machine-guided representation for accurate graph-based molecular machine learning (Phys Chem Chem Phys 2020) [Paper]
- Embedding of Molecular Structure Using Molecular Hypergraph Variational Autoencoder with Metric Learning (Mol Inform 2020) [Paper]
Few-Shot Learning
- Low Data Drug Discovery with One-Shot Learning (ACS Cent Sci 2017) [Paper] (Techs - LSTM: BiLSTM, attLSTM + GNN + Few-Shot Learning)
- Few-Shot Graph Learning for Molecular Property Prediction (WWW 2021) [Paper] [Code]
Meta Learning
- Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction (ICML 2020 Workshop) [Paper] [Code] (Techs - Gated GNN + Meta Learning: MAML, FO-MAML, ANIL)
Active Learning
- ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction (aka: ASGN; KDD 2020) [Paper] [Code] (Techs - Active Learning)
- Evidential Deep Learning for Guided Molecular Property Prediction and Discovery (NeurIPS 2020 Workshop) [Talk]
- Batched Bayesian Optimization for Drug Design in Noisy Environments (J Chem Inf Model 2022) [Paper] [Code]
Model Interpretation
- Drug Discovery Maps, a Machine Learning Model That Visualizes and Predicts Kinome−Inhibitor Interaction Landscapes (J Chem Inf Model 2018) [Paper]
- Using attribution to decode binding mechanism in neural network models for chemistry (PNAS 2019) [Paper]
- Interpretation of QSAR Models by Coloring Atoms According to Changes in Predicted Activity: How Robust Is It? (J Chem Inf Model 2019) [Paper]
- Building of Robust and Interpretable QSAR Classification Models by Means of the Rivality Index (J Chem Inf Model 2019) [Paper]
Dataset Concerns
- In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening (J Chem Inf Model 2019) [Paper]
- Deep Learning-Based Imbalanced Data Classification for Drug Discovery (J Chem Inf Model 2020) [Paper] [Code]
Uncertainty Estimation
- General Approach to Estimate Error Bars for Quantitative Structure−Activity Relationship Predictions of Molecular Activity (J Chem Inf Model 2018) [Paper]
- Assessment and Reproducibility of Quantitative Structure−Activity Relationship Models by the Nonexpert (J Chem Inf Model 2018) [Paper]
- Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks (J Chem Inf Model 2018) [Paper]
- Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout (J Chem Inf Model 2019) [Paper]
- Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks (Nat Mach Intell 2020) [Paper]
- Assigning Confidence to Molecular Property Prediction (arXiv 2021) [Paper]
- Gi and Pal Scores: Deep Neural Network Generalization Statistics (ICLR 2021 Workshop) [Paper]
- Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty (J Cheminf 2021) [Paper]
Representation Capacity
- Ligand-Based Virtual Screening Using Graph Edit Distance as Molecular Similarity Measure (J Chem Inf Model 2019) [Paper]
- Optimal Transport Graph Neural Networks (arXiv 2020) [Paper]
Out-of-Distribution Generalization
- Dissecting Machine-Learning Prediction of Molecular Activity: Is an Applicability Domain Needed for Quantitative Structure−Activity Relationship Models Based on Deep Neural Networks? (J Chem Inf Model 2018) [Paper]
- Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization (J Chem Inf Model 2018) [Paper]
- Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds (J Chem Inf Model 2018) [Paper]
Threshold Adjustment
- GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning (J Chem Inf Model 2021) [Paper] [Code]
Model Comparison
- Validating the validation: reanalyzing a large‑scale comparison of deep learning and machine learning models for bioactivity prediction (J Comput Aided Mol Des 2020) [Paper]
- Comparing classification models-a practical tutorial (J Comput Aided Mol Des 2021) [Paper]
Model Adoption
- A Turing Test for Molecular Generators (J Med Chem 2020) [Paper]
Molecular Docking
- Docking and scoring in virtual screening for drug discovery: methods and applications (Nat Rev Drug Discov 2004) [Paper]
- Benchmarking sets for molecular docking (J Med Chem 2006) [Paper]
- Molecular Docking: A powerful approach for structure-based drug discovery (Curr Comput Aided Drug Des 2011) [Paper]
- Software for Molecular Docking: a review (Biophys Rev 2017) [Paper]
- Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery (ACS Cent Sci 2020) [Paper]
- A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection (Molecules 2020) [Paper]
- GNINA 1.0: molecular docking with deep learning (J Cheminf 2021) [Paper]
Molecular Fragmentation & Assembly