👏 A Survey of Artificial Intelligence in Drug Discovery

💡 Artificial intelligence has been widely applied in drug discovery over the past decade and is still gaining popularity. This repository compiles a collection works on related areas, based on the manuscript Artificial Intelligence in Drug Discovery: Applications and Techniques by Jianyuan Deng et al. The preprint version is available in ResearchGate. Hope you will find it useful for your research (citation is provided below).

🔔 This repository is updated regularly.

@article{deng2022artificial,
  title={Artificial intelligence in drug discovery: applications and techniques},
  author={Deng, Jianyuan and Yang, Zhibo and Ojima, Iwao and Samaras, Dimitris and Wang, Fusheng},
  journal={Briefings in Bioinformatics},
  volume={23},
  number={1},
  pages={bbab430},
  year={2022},
  publisher={Oxford University Press}
}

Reviews and Perspectives
Data, Representation and Benchmarks
- Large-Scale Databases
  - PubChem
  - ChEMBL
  - ZINC
  - Others
- Molecular Representations
- Benchmark Platforms
  - MoleculeNet
  - MolMapNet
  - ChemProp
  - REINVENT
  - Guacamol
  - MOSES
  - GraphINVENT
  - ATOM3D
Model Architectures
Learning Paradigms
Addressing Existing Challenges

1. Reviews and Perspectives

1.1 General Drug Discovery

Integration of virtual and high-throughput screening (Nat Rev Drug Discov 2002) [Paper]
Chemical space and biology (Nature 2004) [Paper]
Computer-based de novo design of drug-like molecules (Nat Rev Drug Discov 2005) [Paper]
On Outliers and Activity Cliffs-Why QSAR Often Disappoints (J Chem Inf Model 2006) [Paper]
Evaluating Virtual Screening Methods: Good and Bad Metrics for the “Early Recognition” Problem (J Chem Inf Model 2007) [Paper]
Virtual screening: an endless staircase? (Nat Rev Drug Discov 2010) [Paper]
Privileged Scaffolds for Library Design and Drug Discovery (Curr Opin Chem Biol 2010) [Paper]
Principles of early drug discovery (Br J Pharmacol 2011) [Paper]
Recognizing Pitfalls in Virtual Screening: A Critical Review (J Chem Inf Model 2012) [Paper]
Multi-objective optimization methods in drug design (Drug Discov Today 2013) [Paper]
Finding the rules for successful drug optimisation (Drug Discov Today 2014) [Paper]
Recent Progress in Understanding Activity Cliffs and Their Utility in Medicinal Chemistry (J Med Chem 2014) [Paper]
Automating Drug Discovery (Nat Rev Drug Discov 2017) [Paper]
Interpretation of Quantitative Structure−Activity Relationship Models: Past, Present, and Future (J Chem Inf Model 2017) [Paper]
Advances and Challenges in Computational Target Prediction (J Chem Inf Model 2019) [Paper]
Duality of activity cliffs in drug discovery (Expert Opin Drug Discov 2019) [Paper]
QSAR without borders (Chem Soc Rev 2020) [Paper]
Designing small molecules for therapeutic success: A contemporary perspective (Drug Discov Today 2021) [Paper]
Phenotypic drug discovery: recent successes, lessons learned and new directions (Nat Rev Drug Discov 2022) [Paper]
Is the reductionist paradox an Achilles Heel of drug discovery? (J Comput Aided Mol 2022) [Paper]

1.2 Drug Discovery in the AI Era

Machine-learning approaches in drug discovery: methods and applications (Drug Discov Today 2015) [Paper]
The rise of deep learning in drug discovery (Drug Discov Today 2018) [Paper]
Applications of machine learning in drug discovery and development (Nat Rev Drug Discov 2019)[Paper]
Deep Learning in Chemistry (J Chem Inf Model 2019) [Paper]
Deep learning for molecular design—a review of the state of the art (Mol Syst Des Eng 2019) [Paper]
Efficient molecular encoders for virtual screening (Drug Discov Today Technol 2019) [Paper]
Artificial intelligence in chemistry and drug design (J Comput Aid Mol Des 2020) [Paper]
Graph convolutional networks for computational drug development and discovery (Brief Bioinformatics 2020) [Paper]
Transfer Learning for Drug Discovery (J Med Chem 2020) [Paper]
Learning Molecular Representations for Medicinal Chemistry (J Med Chem 2020) [Paper]
Exploring chemical space using natural language processing methodologies for drug discovery (Drug Discov Today 2020) [Paper]
Practical Notes on Building Molecular Graph Generative Models (Applied AI Letters 2020) [Paper]
A compact review of molecular property prediction with graph neural networks (Drug Discov Today 2020) [Paper]
Artificial intelligence in drug discovery: Recent advances and future perspectives (Expert Opin Drug Discov 2021) [Paper]
Artificial intelligence in drug discovery and development (Drug Discov Today 2021) [Paper]
Graph neural networks for automated de novo drug design (Drug Discov Today 2021) [Paper]
De novo molecular design and generative models (Drug Discov Today 2021) [Paper]
Artificial Intelligence for Drug Discovery (KDD 2021) [Paper] [Website] [TorchDrug]
Generative Deep Learning for Targeted Compound Design (J Chem Inf Model 2021) [Paper]
Explainable Machine Learning for Property Predictions in Compound Optimization (J Med Chem 2021) [Paper]
A decade of machine learning-based predictive models for human pharmacokinetics: Advances and challenges (Drug Discov Today 2021) [Paper]
Defining Levels of Automated Chemical Design (J Med Chem 2022) [Paper]
Evaluation guidelines for machine learning tools in the chemical sciences (Nat Rev Chem 2022) [Paper]
Combining DELs and machine learning for toxicology prediction (Drug Discov Today 2022) [Paper]

Side Notes: Successful Applications

Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Insilico Medicine)
A Deep Learning Approach to Antibiotic Discovery (Cell 2020) [Paper] [Code] (MIT CSAIL)
"BenevolentAI Announces First Patient Dosed In Its Atopic Dermatitis Clinical Trial" [Link] (BenevolentAI)
"Exscientia Announces First AI-Designed Immuno-Oncology Drug to Enter Clinical Trials" [Link] (Exscientia)
"Breaking Big Pharma's AI barrier: Insilico Medicine uncovers novel target, new drug for pulmonary fibrosis in 18 months" [Link] (Insilico Medicine)

1.3 AI-Driven Drug Discovery: Hope or Hype

Rethinking drug design in the artificial intelligence era (Nat Rev Drug Discov 2020) [Paper]
Towards reproducible computational drug discovery (J Cheminf 2020) [Paper]
Current Trends, Overlooked Issues, and Unmet Challenges in Virtual Screening (J Chem Inf Model 2020) [Paper]
Drug discovery with explainable artificial intelligence (Nat Mach Intell 2020) [Paper]
Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet (Drug Discov Today 2021) [Paper]
Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data used for AI in drug discovery (Drug Discov Today 2021) [Paper]
Critical assessment of AI in drug discovery (Expert Opin Drug Discov 2021) [Paper]
An Insight into Artificial Intelligence in Drug Discovery: An Interview with Professor Gisbert Schneider (Expert Opin Drug Discov 2021) [Paper]

2. Data, Representation & Benchmarks

2.1 Large-Scale Databases

PubChem

PubChem in 2021: new data content and improved web interfaces (Nucleic Acids Res 2021) [Paper] [Website] [Download]

ChEMBL

The ChEMBL database in 2017 (Nucleic Acids Res 2017) [Paper] [Website] [Download] [WebAPI]

ZINC

ZINC 15 – Ligand Discovery for Everyone (J Chem Inf Model 2015) [Paper] [Website]

Others

Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases (Brief Bioinformatics 2019) [Paper]
DrugBank --- DrugBank 5.0: a major update to the DrugBank database for 2018 (Nucleic Acids Res 2018) [Paper] [Website] [Download]
KEGG --- KEGG as a reference resource for gene and protein annotation (Nucleic Acids Res 2016) [Paper] [Website] [Download]
PDBbind --- PDB-wide collection of binding data: current status of the PDBbind database (Bioinformatics 2015) [Paper] [Website] [Download]
BindingDB --- BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology (Nucleic Acids Res 2016) [Paper] [Website] [Download]
DUD --- Benchmarking Sets for Molecular Docking (J Med Chem 2006) [Paper] [Website]
DUD-E --- Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking (J Med Chem 2012) [Paper] [Website]
MUV --- Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data (J Chem Inf Model 2009) [Paper] [Website]
STITCH --- STITCH: interaction networks of chemicals and proteins (Nucleic Acids Res 2008) [Paper] [Website]
GLL&GDD --- Ligand and Decoy Sets for Docking to G Protein-Coupled Receptors (J Chem Inf Model 2012) [Paper] [Website]
NRLiSt BDB --- NRLiSt BDB, the Manually Curated Nuclear Receptors Ligands and Structures Benchmarking Database (J Med Chem 2014) [Paper] [Website]
SIDER --- The SIDER database of drugs and side effects (Nucleic Acids Res 2016) [Paper] [Website]
Offsides&Twosides --- Data-driven prediction of drug effects and interactions (Sci Transl Med 2012) [Paper] [Website]
DILIrank --- DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans (Drug Discov Today 2016) [Paper] [Website]
UniProt --- UniProt: the universal protein knowledgebase in 2021 (Nucleic Acids Res 2021) [Paper] [Website]
PDB --- The Protein Data Bank (Nucleic Acids Res 2000) [Paper] [Website]

2.2 Small Molecule Representations

Molecular representations in AI‑driven drug discovery: a review and practical guide (J Cheminf 2020) [Paper]

2.3 Benchmark Platforms

MoleculeNet

MoleculeNet: a benchmark for molecular machine learning (Chem Sci 2018) [Paper] [Code] [Download]

MolMapNet

Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations (Nat Mach Intell 2021) [Paper] [Code]

ChemProp

Analyzing Learned Molecular Representations for Property Prediction (J Chem Inf Model 2019) [Paper] [Code] [Website]

REINVENT

Molecular De Novo design using Recurrent Neural Networks and Reinforcement Learning (J Cheminf 2017) [Paper] [Code]
REINVENT 2.0 – an AI Tool for De Novo Drug Design (J Chem Inf Model 2020) [Paper] [Code]

GraphINVENT

Graph Networks for Molecular Design (aka: GraphINVENT; Mach Learn: Sci Technol 2021) [Paper] [Code]
Practical Notes on Building Molecular Graph Generative Models (Applied AI Letters 2020) [Paper] [Code]

Guacamol

GuacaMol: Benchmarking Models for de Novo Molecular Design (J Chem Inf Model 2019) [Paper] [Code]

MOSES

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models (Front Pharmacol 2020) [Paper] [Code]

ATOM3D

ATOM3D: Tasks On Molecules in Three Dimensions (NeurIPS 2021) [Paper] [Code] [Website]

3. Model Architectures

3.1 Convolutional Neural Networks

Task*: Molecular Property Prediction; Representation*: Images

Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction (J Chem Inf Model 2017) [Paper] (Techs - CNN + SVM)
Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models (aka: Chemception; arXiv 2017) [Paper]
Toxic Colors: The Use of Deep Learning for Predicting Toxicity of Compounds Merely from Their Graphic Images (aka: Toxic Colors; J Chem Inf Model 2018) [Paper]
KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images (J Cheminf 2019) [Paper] [Code] (Techs - CNN)
Learning Drug Functions from Chemical Structures with Convolutional Neural Networks and Random Forests (J Chem Inf Model 2019) [Paper] (Techs - CNN)
DEEPScreen: high performance drug–target interaction prediction with convolutional neural networks using 2-D structural compound representation (Chem Sci 2020) [Paper] [Code]

Task*: Molecular Property Prediction; Representation*: Fingerprints

Massively Multitask Networks for Drug Discovery (arXiv 2015) [Paper]
Convolutional Networks on Graphs for Learning Molecular Fingerprints (NeurIPS 2015) [Paper] [Code]

Side Note: Molecular Structure Extraction and Recognition

Molecular Structure Extraction from Documents Using Deep Learning (J Chem Inf Model 2019) [Paper]
DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature (J Cheminf 2021) [Paper]
DECIMER: towards deep learning for chemical image recognition (J Cheminf 2020) [Paper] [Code]
DECIMER 1.0: Deep Learning for Chemical Image Recognition using Transformers (chemRxiv 2021) [Paper]
Img2Mol - Accurate SMILES Recognition from Molecular Graphical Depictions (Chem Sci 2021) [Paper] [Code]

3.2 Recurrent Neural Networks

Task*: Molecular Property Prediction; Representation*: SMILES Strings

SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties (aka: SMILES2Vec; arXiv 2017) [Paper]
Large-scale comparison of machine learning methods for drug target prediction on ChEMBL (aka:SmilesLSTM; Chem Sci 2018) [Paper] [Code] (Techs - RNN + GNN + Multi-Task Learning)

Task*: Molecule Generation; Representation*: SMILES Strings

Molecular de‑novo design through deep reinforcement learning (aka: REINVENT; J Cheminf 2017) [Paper] (Techs - RNN: GRU + RL: Policy-gradient REINFORCE)
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks (aka: CharRNN; ACS Cent Sci 2018) [Paper] (Techs - Transfer Learning)
Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design (ICLR 2018 Workshop) [Paper] (Techs - RL: Hybrid A2C, Policy-gradient PPO)
Deep Reinforcement Learning for de novo Drug Design（aka: ReLeaSE; Sci Adv 2018）[Paper] [Code] (Techs - RL: Policy-gradient REINFORCE)
Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design (J Chem Inf Model 2019) [Paper] [Code] (Techs - RNN: BiLSTM + RL: Hybrid Actor-Critic)
Scaffold-Constrained Molecular Generation (J Chem Inf Model 2020) [Paper] (Techs - RL: Policy-based Hill Climbing)

Task*: Molecule Generation; Representation*: Molecular Graphs

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models (aka: GraphRNN; ICML 2018) [Paper] [Code]
Learning Deep Generative Models of Graphs (ICML 2018) [Paper] [Code] (Techs - RNN: LSTM)
MolecularRNN: Generating realistic molecular graphs with optimized properties (arXiv 2019) [Paper]
A Deep-Learning View of Chemical Space Designed to Facilitate Drug Discovery (aka: DESMILES; J Chem Inf Model 2020) [Paper]

3.3 Graph Neural Networks

Task*: Molecular Property Prediction; Representation*: Molecular Graphs

Molecular Graph Convolutions: Moving Beyond Fingerprints (aka: Weave; J Comput Aided Mol Des 2016) [Paper]
Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction (J Chem Inf Model 2017) [Paper]
Semi-supervised classification with graph convolutional networks (aka: GraphConv; ICLR 2017) [Paper] [Code]
Neural Message Passing for Quantum Chemistry (aka: MPNN; ICML 2017) [Paper] [Code]
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions (aka: SchNet; NeurIPS 2017)[Paper] [Code]
Low Data Drug Discovery with One-Shot Learning (ACS Cent Sci 2017) [Paper] (Techs - LSTM: BiLSTM, attLSTM + GNN + Few-Shot Learning)
Large-scale comparison of machine learning methods for drug target prediction on ChEMBL (aka:SmilesLSTM; Chem Sci 2018) [Paper] [Code] (Techs - RNN + GNN + Multi-Task Learning)
PotentialNet for Molecular Property Prediction (aka: PotentialNet; ACS Cent Sci 2018) [Paper] (Techs - GNN: GCNN + Multi-Task Learning)
Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective (aka: MGCN; AAAI 2019) [Paper]
Deep Learning-Based Prediction of Drug-Induced Cardiotoxicity (J Chem Inf Model 2019) [Paper] [Code] (Techs - GCN + Multi-task Learning)
DeepChemStable: Chemical Stability Prediction with an Attention-Based Graph Convolution Network (J Chem Inf Model 2019) [Paper] (Techs - GCN + Attention)
Analyzing Learned Molecular Representations for Property Prediction (aka: Chemrop, D-MPNN; J Chem Inf Model 2019) [Paper] [Code]
Molecule Property Prediction Based on Spatial Graph Embedding (aka: C-SGEN; J Chem Inf Model 2019) [Paper] [Code]
Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism (aka: Attentive FP; J Med Chem 2019) [Paper] [Code]
Graph convolutional neural networks as” general-purpose” property predictors: the universality and limits of applicability (J Chem Inf Model 2020) [Paper]
N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules (aka: N-Gram Graph; NeurIPS 2019) [Paper]
Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction (J Cheminf 2020) [Paper] [Code] (Techs - MPNN + Multi-Task Learning)
A self‑attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility (J Cheminf 2020) [Paper] [Code] (Techs - MPNN + Self-Attention: Interpretability)
Chemically Interpretable Graph Interaction Network for Prediction of Pharmacokinetic Properties of Drug-Like Molecules (aka: CIGIN; AAAI 2020) [Paper] [Code]
Strategies for Pre-training Graph Neural Networks (ICLR 2020) [Paper] [Code] (Techs - Self-Supervised Learning)
Directional Message Passing for Molecular Graphs (aka: DimeNet; ICLR 2020) [Paper] [Code]
Drug–target affinity prediction using graph neural network and contact maps (RSC Advances 2020) [Paper] (Techs - GCN + GAT)
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction (aka: ASGN; KDD 2020) [Paper] [Code] (Techs - Active Learning)
Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction (ICML 2020 Workshop) [Paper] [Code] (Techs - GGNN + Meta Learning: MAML, FO-MAML, ANIL)

Task*: Molecule Generation; Representation*: Molecular Graphs

Multi‑objective de novo drug design with conditional graph generative model (J Cheminf 2018) [Paper] [Code] (Techs - Conditional Graph Generative Model: MolMP, MolRNN)
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (aka: GCPN; NeurIPS 2018) [Paper] [Code] (Techs - GCN + RL: PPO)
Optimization of Molecules via Deep Reinforcement Learning (aka: MolDQN; Sci Rep 2019) [Paper] (Techs - RL: Q-learning)
Improving Molecular Design by Stochastic Iterative Target Augmentation (ICML 2020) [Paper] [Code] (Techs - VSeq2Seq/HierGNN + Semi-Supervised Learnning)
DeepGraphMolGen, a multi‑objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach (J Cheminf 2020) [Paper] (Techs - GCN + RL: PPO)
Reinforced Molecular Optimization with Neighborhood-Controlled Grammars (aka: MNCE-RL; NeurIPS 2020) [Paper] [Code] (Techs - RL: PPO)
Graph Networks for Molecular Design (aka: GraphINVENT; Mach Learn: Sci Technol 2021) [Paper] [Code]
De novo drug design using reinforcement learning with graph-based deep generative models (aka: RL-GraphINVENT; ChemRxiv 2021) [Paper] [Code]

Side Note: Common GNN Models

Recurrent GNNs Gated graph sequence neural networks (aka: GGNN; ICLR 2016) [Paper] [Code]
Convolutional GNNs (Spectral-based) Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (aka: ChebNet; NeurIPS 2016) [Paper] [Code]
Convolutional GNNs (Spectral-based) Semi-supervised classification with graph convolutional networks (aka: GraphConv; ICLR 2017) [Paper] [Code]
Convolutional GNNs (Spatial-based) Neural message passing for quantum chemistry (aka: MPNN; ICML 2017) [Paper] [Code]
Convolutional GNNs (Spatial-based) Inductive Representation Learning on Large Graphs (aka: GraphSAGE; NeurIPS 2017) [Paper] [Code]
Convolutional GNNs (Spatial-based) Graph Attention Networks (aka: GAT; ICLR 2018) [Paper] [Code]
Convolutional GNNs (Spatial-based) How powerful are graph neural networks? (aka: GIN; ICLR 2019) [Paper] [Code]

3.4 Variational Autoencoders

Task*: Molecule Generation; Representation*: SMILES Strings

Automatic chemical design using a data-driven continuous representation of molecules (arXiv 2016; ACS Cent Sci 2018) [Paper] [Code] (Techs - VAE)
Grammar Variational Autoencoder (aka: GrammarVAE; ICML 2017) [Paper]
Application of Generative Autoencoder in De Novo Molecular Design (Mol Inform 2017) [Paper]
Syntax-Directed Variational Autoencoder for Structured Data (aka: SD-VAE; ICLR 2018) [Paper] [Code]
Conditional Molecular Design with Deep Generative Models （aka: Continuous SSVAE; J Chem Inf Model 2018）[Paper] [Code]
Molecular generative model based on conditional variational autoencoder for de novo molecular design (aka: CVAE; J Cheminf 2018) [Paper] [Code] (Techs - VAE)
Constrained Graph Variational Autoencoders for Molecule Design (aka: CGVAE; NeurIPS 2018) [Paper] [Code]
NEVAE: A Deep Generative Model for Molecular Graphs (aka: NeVAE; AAAI 2019) [Paper] [Code]
De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping (aka: GTMVAE; J Chem Inf Model 2019) [Paper] (Techs - Autoencoder + RNN)
Re-balancing Variational Autoencoder Loss for Molecule Sequence Generation (aka: re-balanced VAE; ACM BCB 2020) [Paper] [Code] (Techs - RNN: BiGRU + VAE)
CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models (aka: CogMol; NeurIPS 2020) [Paper] [Code]

VAE Variant: AAE

Application of Generative Autoencoder in De Novo Molecular Design (Mol Inform 2017) [Paper]
druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico (aka: druGAN; Mol Pharm 2017) [Paper]
Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery (aka: SAAE; Mol Pharm 2018) [Paper]

Task*: Molecule Generation; Representation*: Molecular Graphs

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders (aka: GraphVAE; arXiv 2018) [Paper]
Junction Tree Variational Autoencoder for Molecular Graph Generation (aka: JT-VAE; ICML 2018) [Paper] [Code]
Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders (aka:Regularized VAE; NeurIPS 2018)[Paper]
Molecular Hypergraph Grammar with Its Application to Molecular Optimization (aka: MHG-VAE; ICML 2019) [Paper] [Code]
Efficient learning of non‑autoregressive graph variational autoencoders for molecular graph generation (J Cheminf 2019) [Paper] [Code] (Techs - Non-autoregressive VAE + RL)
Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Techs - VAE + RL: REINFORCE)
Scaffold-based molecular design using graph generative model (aka: ScaffoldVAE; arXiv 2019) [Paper]
Learning Multimodal Graph-to-Graph Translation for Molecule Optimization (aka: VJTNN; ICLR 2019) [Paper] [Code]
CORE: Automatic Molecule Optimization Using Copy & Refine Strategy (AAAI 2020) [Paper] [Code]
Hierarchical Generation of Molecular Graphs using Structural Motifs (aka: HierVAE; ICML 2020) [Paper] [Code] (Techs - Hierarchical VAE)
Compressed graph representation for scalable molecular graph generation (J Cheminf 2020) [Paper] [Code] (Techs - Non-autoregressive VAE)

Side Note: Reaction & Retrosynthesis Prediction; Representation*: Molecular Graphs

Generating Molecules via Chemical Reactions (ICLR 2019 Workshop) [Paper]
Barking up the right tree: an approach to search over molecule synthesis DAG (NeurIPS 2020) [Paper] [Code]

3.5 Generative Adversarial Networks

Task*: Molecule Generation; Representation*: SMILES Strings

Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (aka: ORGAN; ArXiv 2017) [Paper] [Code] (Techs - GAN: G-RNN, D-CNN + RL: REINFORCE)
Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (aka: ORGANIC; ChemRxiv 2017) [Paper] [Code] (Techs - GAN + RL: REINFORCE)
Reinforced Adversarial Neural Computer for de Novo Molecular Design (aka: RANC; J Chem Inf Model 2018) [Paper] (Techs - GAN + RL)

Task*: Molecule Generation; Representation*: Molecular Graphs

MolGAN: An implicit generative model for small molecular graphs (aka: MolGAN; ICML 2018 Workshop) [Paper] [Code-Tensorflow] [Code-PyTorch] (Techs - GAN + RL: DDPG)

3.6 Normalizing Flow Models

Task*: Molecule Generation; Representation*: Molecular Graphs

GraphNVP: An Invertible Flow Model for Generating Molecular Graphs (aka: GraphNVP; arXiv 2019) [Paper] [Code]
Graph Residual Flow for Molecular Graph Generation (aka: GRF; arXiv 2019) [Paper]
GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation (aka: GraphAF; ICLR 2020) [Paper] [Code] (Techs - Flow + RL: PPO)
MoFlow: An Invertible Flow Model for Generating Molecular Graphs (aka: MoFlow; KDD 2020) [Paper] [Code]
GraphDF: A Discrete Flow Model for Molecular Graph Generation (aka: GraphDF; ICML 2021) [Paper]
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation (NeurIPS 2021) [Paper]

3.7 Transformers

Task*: Molecular Property Prediction; Representation*: SMILES Strings

SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (aka: SMILES-BERT; ACM BCB 2019) [Paper] (Techs - BERT)
SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery (aka: SMILES Transformer; arXiv 2019) [Paper] [Code]
ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction (aka: ChemBERTa; arXiv 2020) [Paper] [Code]
Molecular representation learning with language models and domain-relevant auxiliary tasks (aka: MolBERT; NeurIPS 2020 Workshop) [Paper] [Code] (Techs - BERT + Self-Supervised Learning)
Algebraic graph-assisted bidirectional transformers for molecular property prediction (aka: AGBT; Nat Commun 2021) [Paper] [Code]
ChemBERTa-2: Towards Chemical Foundation Models (aka: ChemBERTa-2; arXiv 2022) [Paper]

Task*: Molecular Property Prediction; Representation*: Molecular Graphs

Self-Supervised Graph Transformer on Large-Scale Molecular Data (aka: GROVER; NeurIPS 2020) [Paper] [Code] (Techs - Graph Transformer + Self-Supervised Learning)

Task*: Molecule Generation; Representation*: SMILES Strings

Transformer-Based Generative Model Accelerating the Development of Novel BRAF Inhibitors (ACS Omega 2021) [Paper]
MolGPT: Molecular Generation Using a Transformer-Decoder Model (aka: MolGPT; J Chem Inf Model 2022) [Paper] [Code]

Task*: Molecule Generation; Representation*: Molecular Graphs

A Model to Search for Synthesizable Molecules (aka: Molecule Chef; NeurIPS 2019) [Paper] [Code]
Transformer neural network for protein-specific de novo drug generation as a machine translation problem (Sci Rep 2021) [Paper]

4. Learning Paradigms

4.1 Self-Supervised Learning in Molecular Property Prediction

Generative Learning

Strategies for Pre-training Graph Neural Networks (ICLR 2020) [Paper] [Code] (Techs - Self-Supervised Learning)
Molecular representation learning with language models and domain-relevant auxiliary tasks (aka: MolBERT; NeurIPS 2020 Workshop) [Paper] [Code] (Techs - BERT + Self-Supervised Learning)
Self-Supervised Graph Transformer on Large-Scale Molecular Data (aka: GROVER; NeurIPS 2020) [Paper] [Code] (Techs - Graph Transformer + Self-Supervised Learning)

Contrastive Learning

MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (ArXiv 2021) [Paper] [Code]
Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast (J Chem Inf Model 2022) [Paper] [Code]

4.2 Reinforcement Learning in Molecule Generation

Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (aka: ORGAN; ArXiv 2017) [Paper] [Code] (Techs - GAN: G-RNN, D-CNN + RL: Policy-gradient REINFORCE)
Molecular de‑novo design through deep reinforcement learning (aka: REINVENT; J Cheminf 2017) [Paper] (Techs - RNN: GRU + RL: Policy-gradient REINFORCE)
Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (aka: ORGANIC; ChemRxiv 2017) [Paper] [Code] (Techs - GAN + RL: Policy-gradient REINFORCE)
Reinforced Adversarial Neural Computer for de Novo Molecular Design (aka: RANC; J Chem Inf Model 2018) [Paper] (Techs - GAN + RL)
Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design (ICLR 2018 Workshop) [Paper] (Techs - RL: Hybrid A2C, Policy-gradient PPO)
MolGAN: An implicit generative model for small molecular graphs (aka: MolGAN; ICML 2018 Workshop) [Paper] [Code-Tensorflow] [Code-PyTorch] (Techs - GAN + RL: Hybrid Actor-Critic DDPG)
Deep Reinforcement Learning for de novo Drug Design（aka: ReLeaSE; Sci Adv 2018）[Paper] [Code] (Techs - RL: Policy-gradient REINFORCE)
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (aka: GCPN; NeurIPS 2018)[Paper] [Code] (Techs - GCN + RL: Policy-gradient PPO)
Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Techs - VAE + RL: Policy-gradient REINFORCE)
Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design (aka: DeepFMPO; J Chem Inf Model 2019) [Paper] [Code] (Techs - RNN: BiLSTM + RL: Hybrid Actor-Critic)
Optimization of Molecules via Deep Reinforcement Learning (aka: MolDQN; Sci Rep 2019) [Paper] (Techs - RL: Value-based Double Q-learning)
Efficient learning of non‑autoregressive graph variational autoencoders for molecular graph generation (J Cheminf 2019) [Paper] [Code] (Techs - Non-autoregressive VAE + RL: Policy-gradient)
GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation (aka: GraphAF; ICLR 2020) [Paper] [Code] (Techs - Flow + RL: Policy-gradient PPO)
Reinforcement Learning for Molecular Design Guided by Quantum Mechanics (cka: MolGym; ICML 2020) [Paper] (Techs - RL: Policy-gradient PPO)
DeepGraphMolGen, a multi‑objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach (aka: DeepGraphMolGen; J Cheminf 2020) [Paper] (Techs - GCN + RL: Policy-gradient PPO)
Reinforced Molecular Optimization with Neighborhood-Controlled Grammars (aka: MNCE-RL; NeurIPS 2020) [Paper] [Code] (Techs - RL: Policy-gradient PPO)
Deep inverse reinforcement learning for structural evolution of small molecules (Brief Bioinform 2021) [Paper] [Code] (Techs - Inverse RL)

Side Note: Common RL Algorithms

Value-based Playing Atari with Deep Reinforcement Learning (aka: DQN; NeurIPS Workshop 2013) [Paper]
Value-based Human-level control through deep reinforcement learning (aka: DQN; Nature 2015) [Paper]
Value-based Deep Reinforcement Learning with Double Q-learning (aka: Double Q-learning; AAAI 2016) [Paper]
Value-based Prioritized Experience Replay (aka: DQN with Experience Replay; ICLR 2016) [Paper]
Value-based Dueling Network Architectures for Deep Reinforcement Learning (aka: Dueling Network; ICML 2016) [Paper]
Policy-gradient Simple statistical gradient-following algorithms for connectionist reinforcement learning (aka: REINFORCE; Mach Learn 1992) [Paper]
Policy-gradient Policy Gradient Methods for Reinforcement Learning with Function Approximation (aka: Random Policy Gradient; NeurIPS 1999) [Paper]
Policy-gradient Deterministic Policy Gradient Algorithms (aka: DPG; ICML 2014) [Paper]
Policy-gradient Trust Region Policy Optimization (aka: TRPO; ICML 2015) [Paper]
Policy-gradient Proximal Policy Optimization Algorithms (aka: PPO; arXiv 2017 2015) [Paper]
Hybrid Continuous control with deep reinforcement learning (aka: DDPG; ICLR 2016) [Paper]
Hybrid Asynchronous Methods for Deep Reinforcement Learning (aka: A3C; ICML 2016) [Paper]

Side Note: Pareto Optimality

De Novo Drug Design of Targeted Chemical Libraries Based on Artificial Intelligence and Pair-Based Multiobjective Optimization (J Chem Inf Model 2020) [Paper] [Code] (Techs - Pareto Optimality)
Multiobjective de novo drug design with recurrent neural networks and nondominated sorting (J Cheminf 2020) [Paper] (Techs - Pareto Optimality)
DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology (ChemRxiv) [Paper] (Techs - Pareto Optimality)

Side Note: Reaction & Retrosynthesis Optimization

Optimizing chemical reactions with deep reinforcement learning (ACS Cent Sci 2017) [Paper]

4.4 Other Learning Paradigms

Metric Learning

Machine-guided representation for accurate graph-based molecular machine learning (Phys Chem Chem Phys 2020) [Paper]
Embedding of Molecular Structure Using Molecular Hypergraph Variational Autoencoder with Metric Learning (Mol Inform 2020) [Paper]

Few-Shot Learning

Low Data Drug Discovery with One-Shot Learning (ACS Cent Sci 2017) [Paper] (Techs - LSTM: BiLSTM, attLSTM + GNN + Few-Shot Learning)
Few-Shot Graph Learning for Molecular Property Prediction (WWW 2021) [Paper] [Code]

Meta Learning

Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction (ICML 2020 Workshop) [Paper] [Code] (Techs - Gated GNN + Meta Learning: MAML, FO-MAML, ANIL)

Active Learning

ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction (aka: ASGN; KDD 2020) [Paper] [Code] (Techs - Active Learning)
Evidential Deep Learning for Guided Molecular Property Prediction and Discovery (NeurIPS 2020 Workshop) [Talk]
Batched Bayesian Optimization for Drug Design in Noisy Environments (J Chem Inf Model 2022) [Paper] [Code]

5. Addressing Existing Challenges

Model Interpretation

Drug Discovery Maps, a Machine Learning Model That Visualizes and Predicts Kinome−Inhibitor Interaction Landscapes (J Chem Inf Model 2018) [Paper]
Using attribution to decode binding mechanism in neural network models for chemistry (PNAS 2019) [Paper]
Interpretation of QSAR Models by Coloring Atoms According to Changes in Predicted Activity: How Robust Is It? (J Chem Inf Model 2019) [Paper]
Building of Robust and Interpretable QSAR Classification Models by Means of the Rivality Index (J Chem Inf Model 2019) [Paper]

Dataset Concerns

In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening (J Chem Inf Model 2019) [Paper]
Deep Learning-Based Imbalanced Data Classification for Drug Discovery (J Chem Inf Model 2020) [Paper] [Code]

Uncertainty Estimation

General Approach to Estimate Error Bars for Quantitative Structure−Activity Relationship Predictions of Molecular Activity (J Chem Inf Model 2018) [Paper]
Assessment and Reproducibility of Quantitative Structure−Activity Relationship Models by the Nonexpert (J Chem Inf Model 2018) [Paper]
Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks (J Chem Inf Model 2018) [Paper]
Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout (J Chem Inf Model 2019) [Paper]
Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks (Nat Mach Intell 2020) [Paper]
Assigning Confidence to Molecular Property Prediction (arXiv 2021) [Paper]
Gi and Pal Scores: Deep Neural Network Generalization Statistics (ICLR 2021 Workshop) [Paper]
Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty (J Cheminf 2021) [Paper]

Representation Capacity

Ligand-Based Virtual Screening Using Graph Edit Distance as Molecular Similarity Measure (J Chem Inf Model 2019) [Paper]
Optimal Transport Graph Neural Networks (arXiv 2020) [Paper]

Out-of-Distribution Generalization

Dissecting Machine-Learning Prediction of Molecular Activity: Is an Applicability Domain Needed for Quantitative Structure−Activity Relationship Models Based on Deep Neural Networks? (J Chem Inf Model 2018) [Paper]
Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization (J Chem Inf Model 2018) [Paper]
Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds (J Chem Inf Model 2018) [Paper]

Threshold Adjustment

GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning (J Chem Inf Model 2021) [Paper] [Code]

Model Comparison

Validating the validation: reanalyzing a large‑scale comparison of deep learning and machine learning models for bioactivity prediction (J Comput Aided Mol Des 2020) [Paper]
Comparing classification models-a practical tutorial (J Comput Aided Mol Des 2021) [Paper]

Model Adoption

A Turing Test for Molecular Generators (J Med Chem 2020) [Paper]

Molecular Docking

Docking and scoring in virtual screening for drug discovery: methods and applications (Nat Rev Drug Discov 2004) [Paper]
Benchmarking sets for molecular docking (J Med Chem 2006) [Paper]
Molecular Docking: A powerful approach for structure-based drug discovery (Curr Comput Aided Drug Des 2011) [Paper]
Software for Molecular Docking: a review (Biophys Rev 2017) [Paper]
Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery (ACS Cent Sci 2020) [Paper]
A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection (Molecules 2020) [Paper]
GNINA 1.0: molecular docking with deep learning (J Cheminf 2021) [Paper]

Molecular Fragmentation & Assembly

Molecular generation by Fast Assembly of (Deep)SMILES fragments (J Cheminf 2021) [Paper] [Code]

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👏 A Survey of Artificial Intelligence in Drug Discovery

Contents

1. Reviews and Perspectives

2. Data, Representation & Benchmarks

3. Model Architectures

4. Learning Paradigms

5. Addressing Existing Challenges

About

Releases

Packages

dengjianyuan/Survey_AI_Drug_Discovery

Folders and files

Latest commit

History

Repository files navigation

👏 A Survey of Artificial Intelligence in Drug Discovery

Contents

1. Reviews and Perspectives

2. Data, Representation & Benchmarks

3. Model Architectures

4. Learning Paradigms

5. Addressing Existing Challenges

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages