Skip to content

lmmentel/awesome-python-chemistry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Awesome Python Chemistry Awesome

License: CC BY 4.0

A curated list of awesome Python frameworks, libraries, software and resources related to Chemistry.

Inspired by awesome-python.

Table of contents


General Chemistry

Packages and tools for general chemistry.

  • AQME - Ensemble of automated QM workflows that can be run through jupyter notebooks, command lines and yaml files.
  • aizynthfinder - A tool for retrosynthetic planning.
  • batchcalculator - A GUI app based on wxPython for calculating the correct amount of reactants (batch) for a particular composition given by the molar ratio of its components.
  • cctbx - The Computational Crystallography Toolbox.
  • ChemFormula - ChemFormula provides a class for working with chemical formulas. It allows parsing chemical formulas, calculating formula weights, and generating formatted output strings (e.g. in HTML, LaTeX, or Unicode).
  • chemlib - A robust and easy-to-use package that solves a variety of chemistry problems.
  • chempy - ChemPy is a package useful for chemistry (mainly physical/inorganic/analytical chemistry).
  • datamol: - Molecular Manipulation Made Easy. A light wrapper build on top of RDKit.
  • GoodVibes - A Python program to compute quasi-harmonic thermochemical data from Gaussian frequency calculations.
  • hgraph2graph - Hierarchical Generation of Molecular Graphs using Structural Motifs.
  • ionize - Calculates the properties of individual ionic species in aqueous solution, as well as aqueous solutions containing arbitrary sets of ions.
  • LModeA-nano - Calculates the intrinsic chemical bond strength based on local vibrational mode theory in solids and molecules.
  • mendeleev - A package that provides a python API for accessing various properties of elements from the periodic table of elements.
  • nmrglue - A package for working with nuclear magnetic resonance (NMR) data including functions for reading common binary file formats and processing NMR data.
  • molmass - Calculate mass, elemental composition, and mass distribution spectrum of a molecule given by its chemical formula, relative element weights, or sequence.
  • Open Babel - A chemical toolbox designed to speak the many languages of chemical data.
  • periodictable - This package provides a periodic table of the elements with support for mass, density and xray/neutron scattering information.
  • propka - Predicts the pKa values of ionizable groups in proteins and protein-ligand complexes based in the 3D structure.
  • pybaselines - A package for fitting baselines of spectra for baseline correction.
  • pybel - Pybel provides convenience functions and classes that make it simpler to use the Open Babel libraries from Python.
  • pycroscopy - Scientific analysis of nanoscale materials imaging data.
  • pyEQL - A set of tools for conventional calculations involving solutions (mixtures) and electrolytes.
  • pyiron - pyiron - an integrated development environment (IDE) for computational materials science.
  • pymatgen - Python Materials Genomics is a robust, open-source library for materials analysis.
  • pymatviz - A toolkit for visualizations in materials informatics.
  • symfit - a curve-fitting library ideally suited to chemistry problems, including fitting experimental kinetics data.
  • symmetry - Symmetry is a library for materials symmetry analysis.
  • stk - A library for building, manipulating, analyzing and automatic design of molecules, including a genetic algorithm.
  • spectrochempy - A library for processing, analyzing and modeling spectroscopic data.

Machine Learning

Packages and tools for employing machine learning and data science in chemistry.

  • amp - Is an open-source package designed to easily bring machine-learning to atomistic calculations.
  • atom3d - Enables machine learning on three-dimensional molecular structure.
  • chainer-chemistry - A deep learning framework (based on Chainer) with applications in Biology and Chemistry.
  • chemml - A machine learning and informatics program suite for the analysis, mining, and modeling of chemical and materials data.
  • chemprop - Message Passing Neural Networks for Molecule Property Prediction .
  • cgcnn - Crystal graph convolutional neural networks for predicting material properties.
  • deepchem - Deep-learning models for Drug Discovery and Quantum Chemistry.
  • DeepPurpose - A Deep Learning Library for Compound and Protein Modeling DTI, Drug Property, PPI, DDI, Protein Function Prediction.
  • DescriptaStorus - Descriptor computation (chemistry) and (optional) storage for machine learning.
  • DScribe - Descriptor library containing a variety of fingerprinting techniques, including the Smooth Overlap of Atomic Positions (SOAP).
  • graphein - Provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks.
  • Matminer - Library of descriptors to aid in the data-mining of materials properties, created by the Lawrence Berkeley National Laboratory.
  • MoleOOD - a robust molecular representation learning framework against distribution shifts.
  • megnet - Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals.
  • MAML - Aims to provide useful high-level interfaces that make ML for materials science as easy as possible.
  • MORFEUS - Library for fast calculations of molecular features from 3D structures for machine learning with a focus on steric descriptors.
  • olorenchemengine - Molecular property prediction with unified API for diverse models and respresentations, with integrated uncertainty quantification, interpretability, and hyperparameter/architecture tuning.
  • ROBERT - Ensemble of automated machine learning protocols that can be run sequentially through a single command line. The program works for regression and classification problems.
  • schnetpack - Deep Neural Networks for Atomistic Systems.
  • selfies - Self-Referencing Embedded Strings (SELFIES): A 100% robust molecular string representation.
  • Summit - Package for optimizing chemical reactions using machine learning (contains 10 algorithms + several benchmarks).
  • TDC - Therapeutics Data Commons (TDC) is the first unifying framework to systematically access and evaluate machine learning across the entire range of therapeutics.
  • XenonPy - Library with several compositional and structural material descriptors, along with a few pre-trained neural network models of material properties.

Generative Molecular Design

Packages and tools for generating molecular species

  • GraphINVENT - A platform for graph-based molecular generation using graph neural networks.
  • GuacaMol - A package for benchmarking of models for de novo molecular design.
  • moses - A benchmarking platform for molecular generation models.
  • perses - Experiments with expanded ensembles to explore chemical space.

Simulations

Packages for atomistic simulations and computational chemistry.

  • alchemlyb - Makes alchemical free energy calculations easier by leveraging the full power and flexibility of the PyData stack.
  • atomate2 - atomate2 is a library of computational materials science workflows.
  • Atomic Silumation Environment (ASE) - Is a set of tools and modules for setting up, manipulating, running, visualizing and analyzing atomistic simulations.
  • basis_set_exchange - A library containing basis sets for use in quantum chemistry calculations. In addition, this library has functionality for manipulation of basis set data.
  • CACTVS - Cactvs is a universal, scriptable cheminformatics toolkit, with a large collection of modules for property computation, chemistry data file I/O and other tasks.
  • CalcUS - Quantum chemisttry web platform that brings all the necessary tools to perform quantum chemistry in a user-friendly web interface.
  • cantera - A collection of object-oriented software tools for problems involving chemical kinetics, thermodynamics, and transport processes.
  • CatKit - General purpose tools for high-throughput catalysis.
  • ccinput - A tool and library for creating quantum chemistry input files.
  • cclib - A library for parsing output files various quantum chemical programs.
  • cctk - A library for computational chemistry (DFT) for input file generation, data extraction, method screening and analysis.
  • cinfony - A common API to several cheminformatics toolkits (Open Babel, RDKit, the CDK, Indigo, JChem, OPSIN and cheminformatics webservices).
  • chemlab - Is a library that can help the user with chemistry-relevant calculations.
  • emmet - A package to 'build' collections of materials properties from the output of computational materials calculations.
  • fromage - The "FRamewOrk for Molecular AGgregate Excitations" enables localised QM/QM' excited state calculations in a solid state environment.
  • GPAW - Is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE).
  • horton - Helpful Open-source Research TOol for N-fermion system, a quantum-chemistry program that can perform computations involving model Hamiltonians.
  • HTMD - High-Throughput Molecular Dynamics: Programming Environment for Molecular Discovery.
  • Indigo - Universal cheminformatics libraries, utilities and database search tools.
  • IoData - File parser/converter for QM, MD and plane-wave DFT programs.
  • Jarvis-tools - An open-access software package for atomistic data-driven materials design
  • mathchem - Is a free open source package for calculating topological indices and other invariants of molecular graphs.
  • MDAnalysis - Is an object-oriented library to analyze trajectories from molecular dynamics (MD) simulations in many popular formats.
  • MDTraj - Package for manipulating molecular dynamics trajectories with support for multiple formats.
  • MMTK - The Molecular Modeling Toolkit is an Open Source program library for molecular simulation applications.
  • MolMod - A library with many components that are useful to write molecular modeling programs.
  • nmrsim - A library for simulating first- or second-order NMR spectra and dynamic NMR resonances.
  • oddt - Open Drug Discovery Toolkit, a modular and comprehensive toolkit for use in cheminformatics, molecular modeling etc.
  • OPEM - Open source PEM (Proton Exchange Membrane) fuel cell simulation tool.
  • openmmtools - A batteries-included toolkit for the GPU-accelerated OpenMM molecular simulation engine.
  • overreact - A library and command-line tool for building and analyzing complex homogeneous microkinetic models from quantum chemistry calculations, with support for quasi-harmonic thermochemistry, quantum tunnelling corrections, molecular symmetries and more.
  • ParmEd - Parameter/topology editor and molecular simulator with visualization capability.
  • pGrAdd - A library for estimating thermochemical properties of molecules and adsorbates using group additivity.
  • phonopy - An open source package for phonon calculations at harmonic and quasi-harmonic levels.
  • PLAMS - Python Library for Automating Molecular Simulation: input preparation, job execution, file management, output processing and building data workflows.
  • pMuTT - A library for ab-initio thermodynamic and kinetic parameter estimation.
  • PorePy - A Simulation Tool for Fractured and Deformable Porous Media.
  • ProDy - An open source package for protein structural dynamics analysis with a flexible and responsive API.
  • ProLIF - Interaction Fingerprints for protein-ligand complexes and more.
  • Psi4 - A hybrid Python/C++ open-source package for quantum chemistry.
  • Psi4NumPy - Psi4-based reference implementations and Jupyter notebook-based tutorials for foundational quantum chemistry methods.
  • pyEMMA - Library for the estimation, validation and analysis Markov models of molecular kinetics and other kinetic and thermodynamic models from molecular dynamics data.
  • pygauss - An interactive tool for supporting the life cycle of a computational molecular chemistry investigations.
  • PyQuante - Is an open-source suite of programs for developing quantum chemistry methods.
  • pysic - A calculator incorporating various empirical pair and many-body potentials.
  • Pyscf - A quantum chemistry package written in Python.
  • pyvib2 - A program for analyzing vibrational motion and vibrational spectra.
  • RDKit - Open-Source Cheminformatics Software.
  • ReNView - A program to visualize reaction networks.
  • stk - A library for building, manipulating, analyzing and automatic design of molecules.
  • QMsolve - A module for solving and visualizing the Schrödinger equation.
  • QUIP - A collection of software tools to carry out molecular dynamics simulations.
  • torchmd - End-To-End Molecular Dynamics (MD) Engine using PyTorch.
  • tsase - The library which depends on ASE to tackle transition state calculations.
  • yank - An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.

Force Fields

Packages related to force fields

  • acpype - Convert AMBER forcefields from ANTECHAMBER to GROMACS format.
  • CHGNet - Pretrained universal neural network potential for charge-informed atomistic modeling.
  • FitSNAP - A Package For Training SNAP Interatomic Potentials for use in the LAMMPS molecular dynamics package.
  • fftool - Tool to build force field input files for molecular simulation.
  • FLARE - A package for creating fast and accurate interatomic potentials.
  • global-chem - A Chemical Knowledge Graph and Toolkit, writting in IUPAC/SMILES/SMARTS, for common small molecules from diverse communities to aid users in selecting compounds for forcefield parametirization.
  • matbench-discovery - A benchmark for ML-guided high-throughput materials discovery.
  • NeuralForceField - Neural Network Force Field based on PyTorch.
  • openff-toolkit - The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools.

Molecular Visualization

Packages for viewing molecular structures.

  • ase-gui - The graphical user-interface allows users to visualize, manipulate, and render molecular systems and atoms objects.
  • chemiscope - An interactive structure/property explorer for materials and molecules.
  • chemview - An interactive molecular viewer designed for the IPython notebook.
  • imolecule - An embeddable webGL molecule viewer and file format converter.
  • moleculekit - A molecule manipulation library.
  • nglview - A Jupyter widget to interactively view molecular structures and trajectories.
  • PyMOL - A user-sponsored molecular visualization system on an open-source foundation, maintained and distributed by Schrödinger.
  • pymoldyn - A viewer for atomic clusters, crystalline and amorphous materials in a unit cell corresponding to one of the seven 3D Bravais lattices.
  • rdeditor - Simple RDKit molecule editor GUI using PySide.
  • sumo - A toolkit for plotting and analysis of ab initio solid-state calculation data.
  • surfinpy - A library for the analysis, plotting and visualisation of ab initio surface calculation data.
  • trident-chemwidgets - Jupyter Widgets to interact with molecular datasets.

Database Wrappers

Providing a python layer for accessing chemical databases

  • ccdc - An API for the Cambridge Structural Database System.
  • ChemSpiPy - ChemSpider wrapper, that allows chemical searches, chemical file downloads, depiction and retrieval of chemical properties.
  • CIRpy - An interface for the Chemical Identifier Resolver (CIR) by the CADD Group at the NCI/NIH.
  • NistChemPy - A package for accessing data from the NIST webbook. API includes access to thermodynamic properties, molecular structures, IR/MS/UV-Vis spectra and more.
  • pubchempy - PubChemPy provides a way to interact with PubChem in Python.
  • chembl-downloader - Automate downloading and querying the latest (or a given) version of ChEMBL
  • drugbank-downloader - Automate downloading, opening, and parsing DrugBank

Learning Resources

Resources for learning to apply python to chemistry.

  • An Introduction to Applied Bioinformatics - A Jupyter book demonstrating working with biochemical data using the scikit-bio library for tasks such as sequence alignment and calculating Hamming distances.
  • Computational Thermodynamics - This collection of Jupyter notebooks demonstrates solutions to a range of thermodynamic problems including solving chemical equilibria, comparing real versus ideal gas behavior, and calculating the temperature and composition of a combustion reaction.
  • SciCompforChemists - Scientific Computing for Chemists with Python is a Jupyter book teaching basic python in chemistry skills, including relevant libraries, and applies them to solving chemical problems.

Miscellaneous Awesome

  • Colorful Nuclide Chart - A beatuful, interactive visualization of nuclides with access to a varirty of nuclear properties and allows saving high quality images for publications, presentations and outreach.

See Also

  • awesome-cheminformatics Another list focuses on Cheminformatics, including tools not only in Python.
  • awesome-small-molecule-ml A collection of papers, datasets, and packages for small-molecule drug discovery. Most links to code are in Python.
  • awesome-molecular-docking A curated list of molecular docking software, datasets, and papers.
  • jarvis Joint Automated Repository for Various Integrated Simulations is a repository designed to automate materials discovery and optimization using classical force-field, density functional theory, machine learning calculations and experiments.
  • polypharmacy-ddi-synergy-survey A collection of research papers (with Python implementations) focusing on drug-drug interactions, synergy and polypharmacy.