ChemFlow_py tutorial

What you are going to do

ChemFlow_py is a toolkit for virtual screening, thus in this tutorial we will show some useful functions for molecular docking, consensus ranking and free energy rescoring. In this tutorial we will show you ChemFlow_py functionalities using a common dataset in virtual screening: the protein CDK2 from the DUD-E database.

Step 0 - Activate the anaconda environmet

ChemFlow_py is designed to run in the anaconda environment provided, to do so type in the terminal:

conda activate chemflowpy

Step 1 - Download and prepare the files

For this tutorial, you will need some files contained in this DUD-E page

Otherwise, all the files all already available from this archive (untar the file cdk2 archive and ligand/decoys archives contained)

Extract actives_final.mol2.gz and decoys_final.mol2.gz

The files we are going to use are:

File	Description
`actives_final.mol2`	File containing the known actives compounds in the mol2 format
`decoys_final.mol2`	File containing the decoys compounds in the mol2 format
`receptor.pdb`	The pdb structure of the protein/receptor
`crystal_ligand.mol2`	The crystallographic structure of a ligand in the bindig syte

ChemFlow_py only takes only one compounds.mol2 file, thus we have to select the decoys and actives for the screening

To select the ligands you can:

select manually the actives and decoys and insert them into a new file named 'compounds.mol2'
Select 5 random actives and 25 random decoys using a python interpreter with the following code.

ipython # to enter the python interpreter

In the python interpreter paste:

import chemflow as cf
import random
# read the ligand and decoys as input
actives_dict = cf.read_ligands('actives_final.mol2')
decoys_dict = cf.read_ligands('decoys_final.mol2')
ligand_dict = dict(actives_dict, **decoys_dict)
# select 10 actives and 30 decoys randomly
random_act = random.sample(list(actives_dict.keys()), k=5)
random_dec = random.sample(list(decoys_dict.keys()), k=25)
random_ligand = random_act + random_dec
# create the compounds.mol2 file
compunds_text= '\n'.join([ligand_dict[key] for key in random_ligand])
with open('compounds.mol2', 'w') as file:
    file.write(compunds_text)

Step 2 - Run Docking

The docking programs currently supported are:

Program	characteristic
`autodock`	Lamarckian Genetic Optimization, box grid with explicit spacing
`plants`	Ant Colony Optimization, spherical grid
`qvina`	Accelerated fork of Vina, box grid
`smina`	Flexible fork of Vina, box grid
`vina`	Iterated Local Search Optimizer, box grid

Most docking programs require the xyz coordinates of the center of the binding site, with some parameters to define the docking grid. You can use ChemFlow tools and a reference ligand to compute these data.
You may also provide the coordinates manually.

Using a reference ligand:

command line:

  chemflow.py dock -r receptor.pdb -l compounds.mol2 -p smina --protocol smina_test --crystal crystal_ligand.mol2 --postprocess -k 3

python interpreter/script

import chemflow as cf
docking = cf.Dock('receptor.pdb', 'compounds.mol2', program='smina', protocol='smina_test')
docking.bounding_shape('crystal_ligand.mol2', shape='both')
docking.setup(overwrite=False)
docking.run(computed=True, keep_poses=3)

Manual coordinates:

To see the coordinates of the reference ligand center use (the program uses a default padding of 15 Å):

python $(which bound_shape.py) crystal_ligand.mol2 --shape both

Substitute 'x y z' with the coordinates that you want

command line:

chemflow.py dock -r receptor.pdb -l compounds.mol2 -p smina --protocol smina_test --center x y z --postprocess -k 3

python interpreter/script

import chemflow as cf
docking = cf.Dock('receptor.pdb', 'compounds.mol2', program='smina', protocol='smina_test')
docking.data['center'] = [x, y, z]
docking.setup(overwrite=False)
docking.run(computed=True, keep_poses=3)

Remark:

Each docking program has different settings, parameters and defualt values. To check them type:

chemflow.py dock -p program_name --help

Results:

You can try to run the code with different programs. ChemFlow_py organizes the file in a directory (the 'protocol' argument) and produce csv files containing the ranking and a 'computed.mol2' file containing the docked molecules

Step 3 - Scoring function and free energy rescoring

Rescoring relies on different scoring function to evaluate the protein-ligand interaction.
The scoring function currently supported are:

autodock
chemplp
dkoes_fast
dkoes_scoring
plp
plp95
vina
vinardo

To do free energy rescoring:

MMPB-SA
MMGB-SA

Use smina to evaluate: dkoes_fast, dkoes_scoring, vina, vinardo
Use plants to evaluate: chemplp, plp, plp95
Use autodock to evaluate: autodock
Use amber or gromacs to evaluate: mmpbsa, mmgbsa

The procedure is the same as for docking, we just use docked structures and specify the scoring function:

command line:

chemflow.py rescore -r receptor.pdb -l smina_test/computed_ligands.mol2 -p plants --scoring plp --protocol plp_rescoring --crystal crystal_ligand.mol2

python interpreter/script

import chemflow as cf
rescoring = cf.Rescore('receptor.pdb', 'smina_test/computed_ligands.mol2', program='plants', protocol='plp_rescoring')
rescoring.bounding_shape('crystal_ligand.mol2', shape='both')
rescoring.data['scoring'] = 'plp'
rescoring.setup(overwrite=False)
rescoring.run()

Advice:

Have a look at chemflow.py rescore -p program_name --help to see the full options

Remark:

Amber and Gromacs are very punctilious about the pdb structure of the protein. You may have to fix the missing atoms.

Step 4 - Consensus ranking

Consensus approach is based on the combination of the different ranking obtained by docking. Thus, to run consensus properly, run different docking program before

The consensus method supported are:
ans, ass, ecr, rbn, rbr, rbv, z_score

To run consensus:

command line:

  chemflow.py consensus -pl smina_test plants_test -cm z_score rbv

python interpreter/script

import chemflow as cf
protocol_list = ['smina_test', 'plants_test', ...]
cf.consensus(protocol_list, methods=['z_score', 'rbv'])

If you have the list with all the decoys, you can analyse the results of docking, rescoring and consensus with the python interpreter:

import chemflow as cf
decoy_list = cf.ligand_list('decoys_final.mol2')
protocol_list = ['smina_test', 'plants_test', ..., 'plp_rescoring', ...]
cf.analysis(protocol_list, decoy_list, consensus_deep=3, methods=['zscore', 'rbv'])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial.md

tutorial.md

ChemFlow_py tutorial

What you are going to do

Step 0 - Activate the anaconda environmet

Step 1 - Download and prepare the files

Step 2 - Run Docking

Step 3 - Scoring function and free energy rescoring

Step 4 - Consensus ranking

Files

tutorial.md

Latest commit

History

tutorial.md

File metadata and controls

ChemFlow_py tutorial

What you are going to do

Step 0 - Activate the anaconda environmet

Step 1 - Download and prepare the files

Step 2 - Run Docking

Step 3 - Scoring function and free energy rescoring

Step 4 - Consensus ranking