Skip to content

DockFlow2

DIEGO ENRY GOMES edited this page Jun 26, 2018 · 3 revisions

Description

DockFlow aims at making docking campaigns and virtual screening easy tasks accessible by everyone in a unified standard. DockFlow implements so far PLANTS and VINA as docking programs. Its main advantage is to properly organize results, manage errors and resume and checkpoint calculations to make the user's experience as productive as possible. For power users it serves as an interface to High Performance Computing (HPC) along SLURM and PBS, the most common platforms.

DockFlow is integrated to the ChemFlow environment, by enforcing the ChemFlow 1.0 standard, making it efficient to subsequently rescore docking poses with other scoring functions, run MD simulations, or automatically produce tables and graphs for a thorough analysis.

General usage of ChemFlow.

All files within a ChemFlow project are contained in a "[PROJECT].chemflow" folder, which is created the first time a user runs a project. This folder is designed to be portable for remote execution of projects and is commonly organized as follows: [project].chemflow/[action]/[protocol]/[target]/[query]/

Item Example Description
project myosin Name of your current project
action DockFlow One of ChemFlow workflows: DockFlow, ScoreFlow, LigFlow, MDFlow
protocol default User defined name for a protocol to allow multiple parameter trials
target Myo6 Name of the molecule in study, from the MOL2/PDB file
query ZINC0000101 A compound

DockFlow overview

In molecular docking the goal it predict the optimal way two molecules could form a complex, to do so, a docking algorithm has do perform a search🔎 and score🏆 each docking pose. This version of DockFlow implements PLANTS and VINA to run the docking experiments.

Thus DockFlow requires a few things from the user: In a nutshell you may run DockFlow as bellow (type DockFlow -h for help).

DockFlow -r myo6.mol2 -l compounds.mol2 -p myosin6 -c 5.0 0.2 1.0 

###Basic input parameters:

Example Description
-p myproject A name for the project
-r myreceptor.mol2 A .mol2 file containing you receptor(s)
-l mycompounds.mol2 A .mol2 file containing you compounds(s)

Search parameters and scoring function

(Depends on the software used, for rescoring use ScoreFlow).

Example Description
--center 1.0 -3.2 1.5 The binding site center, in Cartesian coordinates
--radius 15 Distance to the binding center (for PLANTS)
--size 15 15 15 Length in each cartesian axis (for VINA)
-n 10 Number of docking poses
-sf chemplp chemplp, plp, plp95, vina are available

ℹ️ The "@<TRIPOS>MOLECULE" field will be used for receptor and compound identification.
ℹ️ To ensure reliability only works with properly created mol2.

For high throughput execution, DockFlow user either locally, using all computer cores available. While on a HPC environment DockFlow may work through a Workload Manager, SLURM or PBS are implemented.

Example Description
-w Workload Manager SLURM or PBS. Default is "None"
-nc Number of cores/node. Default is to use all available in the local computer
-nn Number of nodes to use within the HPC environment

Advanced mode

To give complete freedom to the users DockFlow allows an advance mode execution, all necessary is to provide an input file which will be concatenate into every docking input file. This is done through the "--advanced" flag. In addition, to keep track of different protocols using the --protocol flag.

DockFlow results

Results are located inside the "DockFlow*" within you project.chemflow folder, following the chemflow file standard for docking and virtual screening experiments: [project].chemflow/[action]/[protocol]/[target]/[query]/. A walkthrough of the program-independend file organization is depicted bellow.

Item Example Description
project myosin myosin.chemflow
action DockFlow myosin.chemflow/DockFlow/
protocol default myosin.chemflow/DockFlow/default/
target myo6 myosin.chemflow/DockFlow/default/myo6/
query ZINC0000101 myosin.chemflow/DockFlow/default/myo6/ZINC0000101/

Standard output files.

For the sake of reproducibility, DockFlow preserves all the outputs automatically generated by it's implemented docking programs. Each docking software have it's own way to format the outputs, to standardize the output we should will post-process the results into the ChemFlow standardized output files: organized by protocol and receptor. This is achieved by using the --postdock flag.

DockFlow -p myproject --postdock

--postdock gathers all docking poses from all compounds for a certain receptor within a protocol into "docked_ligands.mol2" and their information into rank.csv, including protocol name, docking program, ligand, pose and energy, all Tidy Data format making it easy to further manipulate.

[project].chemflow/[action]/[protocol]/[target]/{docked_ligands.mol2,ranking.csv}

File Kind Description
docked_ligands.mol2 Docking Poses All docking poses, for all compounds to a RECEPTOR
rank.csv Information about all docking poses

Archiving the results (after --postdock)

Docking produces lots of output files that we keep for traceability but are unnecessary to hold after the --postdock. Archiving massively compresses the files and especially the number of files. This is particularly important when you work remotely or like to backup your data, it's MUCH faster and easier to to handle one big file/protocol than hundreds or thousands of small files.

--archive gathers all the docking folders into an single archive.

Main output files for PLANTS

Original output Kind Description
docked_ligands.mol2 Docking Poses Each compound folder contains a multi-mol2 file containing all docking poses
ranking.csv Docking score and rank The pose and its score (ranked by the scoring function)
docking.log Log files A log file detailing the docking program (PLANTS or VINA) execution

Archiving and cleaning up.

For the sake of reproducibility it's of utmost importance to keep track of the file inputs and parameters for any computational experiment. [More into that...]

Example structure for a Virtual Screening on Myosin 6:

# The input files
myo6.mol2      # The receptor.
compounds.mol2 # All compounds to dock.

Running DockFlow.

DockFlow -r myo6.mol2 -l compounds.mol2 -p myosin6 -center 5.0 0.2 1.0 --radius 15 -sf chemplp

Wait until it's done and process the results

DockFlow -p myosin6 --postdock 

You're done !

Inside DockFlow organization.

# Step 1 - Organizes all compounds are split into single .mol2 files. (LigFlow)
myosin6.chemflow/LigFlow/original/ligands.lst
myosin6.chemflow/LigFlow/original/AB-00001583.mol2

# Step 2 - Docking/VS.
myosin6.chemflow/DockFlow/default/myo6/receptor.mol2
myosin6.chemflow/DockFlow/default/myo6/AB-00001583/ligand.mol2        #[input]
myosin6.chemflow/DockFlow/default/myo6/AB-00001583/PLANTS/poses.mol2  #[output]
myosin6.chemflow/DockFlow/default/myo6/AB-00001583/PLANTS/rank.csv    #[output]

FAQ

I have a chemical library available as a single mol2 or sdf file. Do you have any tool to split it in several mol2 files before running DockFlow on a cluster ?

  • splitmol can split any sdf, smi or mol2 file in smaller files. It uses Open Babel to perform the splitting.
    Run splitmol -h for more information.