Reproducing results from Watanabe, Reiko, et al. "Predicting fraction unbound in human plasma from chemical structure: improved accuracy in the low value ranges." Molecular pharmaceutics 15.11 (2018): 5302-5311.
This model and results are used as a baseline comparison in Riedl, et al. "Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma." Molecular pharmaceutics 2023.
@article{riedl2023descriptor,
title={Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma},
author={Riedl, Michael and Mukherjee, Sayak and Gauthier, Mitch},
journal={Molecular Pharmaceutics},
year={2023},
publisher={ACS Publications}
}
We provide a script for automatically building the required conda environment. However, the steps in the script can also be run manually as shown below. You must also install JRE 6+ in order to use padelpy. We provide the precalculated descriptors and fingerprints so users do not need to repeat that computationally intensive step or cannot do so due to operating system constraints.
There is a Windows batch script for creating the required conda environment in the build/
folder. You will need to modify the first line to point to your Anaconda or Miniconda installation folder before running the batch script.
set CONDAPATH=PATH\TO\Anaconda3
You can also install the packages manually. It is recommended to follow the installation order in the batch script. Some of the dependencies are broken due to the age of some of the packages; installing them in the order provided produces a working environment.
conda env remove --name watanabe-env -y
conda create --name watanabe-env python=3.9 -y --copy
conda activate watanabe-env
pip install rdkit mordred Boruta padelpy
pip install numpy==1.19.5 pandas matplotlib joblib black
The results will need to be reproduced sequentially as the Morded descriptors and PaDEL fingerprints are only calculated once and then cached in the data/
folder. Additionally, the descriptor down-selection with the Boruta package only happens once and the down-selected descriptors are cached in the data/
folder. These two steps are run in the run/train_log_d1_rf.py
script. Finally, the trained model is saved at the end of the script and is required for the run/eval_log_d1_rf_opera.py
script. Therefore, that script must be run once first before any of the other results scripts.