-
Notifications
You must be signed in to change notification settings - Fork 4
DockFlow2
DockFlow aims at making docking campaigns and virtual screening easy tasks accessible by everyone in a unified standard. DockFlow implements so far PLANTS and VINA as docking programs. Its main advantage is to properly organize results, manage errors and resume and checkpoint calculations to make the user's experience as productive as possible. For power users it serves as an interface to High Performance Computing (HPC) along SLURM and PBS, the most common platforms.
DockFlow is integrated to the ChemFlow environment, by enforcing the ChemFlow 1.0 standard, making it efficient to subsequently rescore docking poses with other scoring functions, run MD simulations, or automatically produce tables and graphs for a thorough analysis.
All files within a ChemFlow project are contained in a "[PROJECT].chemflow" folder, which is created the first time a user runs a project. This folder is designed to be portable for remote execution of projects and is commonly organized as follows: [project].chemflow/[action]/[protocol]/[target]/[query]/
Item | Example | Description |
---|---|---|
project | myosin | Name of your current project |
action | DockFlow | One of ChemFlow workflows: DockFlow, ScoreFlow, LigFlow, MDFlow |
protocol | default | User defined name for a protocol to allow multiple parameter trials |
target | Myo6 | Name of the molecule in study, from the MOL2/PDB file |
query | ZINC0000101 | A compound |
In molecular docking the goal it predict the optimal way two molecules could form a complex, to do so, a docking algorithm has do perform a search🔎 and score🏆 each docking pose. This version of DockFlow implements PLANTS and VINA to run the docking experiments.
Thus DockFlow requires a few things from the user: In a nutshell you may run DockFlow as bellow (type DockFlow -h for help).
DockFlow -r myo6.mol2 -l compounds.mol2 -p myosin6 -c 5.0 0.2 1.0
###Basic input parameters:
Example | Description |
---|---|
-p myproject | A name for the project |
-r myreceptor.mol2 | A .mol2 file containing you receptor(s) |
-l mycompounds.mol2 | A .mol2 file containing you compounds(s) |
(Depends on the software used, for rescoring use ScoreFlow).
Example | Description |
---|---|
--center 1.0 -3.2 1.5 | The binding site center, in Cartesian coordinates |
--radius 15 | Distance to the binding center (for PLANTS) |
--size 15 15 15 | Length in each cartesian axis (for VINA) |
-n 10 | Number of docking poses |
-sf chemplp | chemplp, plp, plp95, vina are available |
ℹ️ The "@<TRIPOS>MOLECULE" field will be used for receptor and compound identification.
ℹ️ To ensure reliability only works with properly created mol2.
For high throughput execution, DockFlow user either locally, using all computer cores available. While on a HPC environment DockFlow may work through a Workload Manager, SLURM or PBS are implemented.
Example | Description |
---|---|
-w | Workload Manager SLURM or PBS. Default is "None" |
-nc | Number of cores/node. Default is to use all available in the local computer |
-nn | Number of nodes to use within the HPC environment |
To give complete freedom to the users DockFlow allows an advance mode execution, all necessary is to provide an input file which will be concatenate into every docking input file. This is done through the "--advanced" flag. In addition, to keep track of different protocols using the --protocol flag.
Results are located inside the "DockFlow*" within you project.chemflow folder, following the chemflow file standard for docking and virtual screening experiments: [project].chemflow/[action]/[protocol]/[target]/[query]/. A walkthrough of the program-independend file organization is depicted bellow.
Item | Example | Description |
---|---|---|
project | myosin | myosin.chemflow |
action | DockFlow | myosin.chemflow/DockFlow/ |
protocol | default | myosin.chemflow/DockFlow/default/ |
target | myo6 | myosin.chemflow/DockFlow/default/myo6/ |
query | ZINC0000101 | myosin.chemflow/DockFlow/default/myo6/ZINC0000101/ |
For the sake of reproducibility, DockFlow preserves all the outputs automatically generated by it's implemented docking programs. Each docking software have it's own way to format the outputs, to standardize the output we should will post-process the results into the ChemFlow standardized output files: organized by protocol and receptor. This is achieved by using the --postdock flag.
DockFlow -p myproject --postdock
--postdock gathers all docking poses from all compounds for a certain receptor within a protocol into "docked_ligands.mol2" and their information into rank.csv, including protocol name, docking program, ligand, pose and energy, all Tidy Data format making it easy to further manipulate.
[project].chemflow/[action]/[protocol]/[target]/{docked_ligands.mol2,ranking.csv}
File | Kind | Description |
---|---|---|
docked_ligands.mol2 | Docking Poses | All docking poses, for all compounds to a RECEPTOR |
rank.csv | Information about all docking poses |
Docking produces lots of output files that we keep for traceability but are unnecessary to hold after the --postdock. Archiving massively compresses the files and especially the number of files. This is particularly important when you work remotely or like to backup your data, it's MUCH faster and easier to to handle one big file/protocol than hundreds or thousands of small files.
--archive gathers all the docking folders into an single archive.
Original output | Kind | Description |
---|---|---|
docked_ligands.mol2 | Docking Poses | Each compound folder contains a multi-mol2 file containing all docking poses |
ranking.csv | Docking score and rank | The pose and its score (ranked by the scoring function) |
docking.log | Log files | A log file detailing the docking program (PLANTS or VINA) execution |
For the sake of reproducibility it's of utmost importance to keep track of the file inputs and parameters for any computational experiment. [More into that...]
# The input files
myo6.mol2 # The receptor.
compounds.mol2 # All compounds to dock.
DockFlow -r myo6.mol2 -l compounds.mol2 -p myosin6 -center 5.0 0.2 1.0 --radius 15 -sf chemplp
DockFlow -p myosin6 --postdock
You're done !
# Step 1 - Organizes all compounds are split into single .mol2 files. (LigFlow)
myosin6.chemflow/LigFlow/original/ligands.lst
myosin6.chemflow/LigFlow/original/AB-00001583.mol2
# Step 2 - Docking/VS.
myosin6.chemflow/DockFlow/default/myo6/receptor.mol2
myosin6.chemflow/DockFlow/default/myo6/AB-00001583/ligand.mol2 #[input]
myosin6.chemflow/DockFlow/default/myo6/AB-00001583/PLANTS/poses.mol2 #[output]
myosin6.chemflow/DockFlow/default/myo6/AB-00001583/PLANTS/rank.csv #[output]
❓ I have a chemical library available as a single mol2 or sdf file. Do you have any tool to split it in several mol2 files before running DockFlow on a cluster ?
-
splitmol can split any sdf, smi or mol2 file in smaller files. It uses Open Babel to perform the splitting.
Runsplitmol -h
for more information.