-
Notifications
You must be signed in to change notification settings - Fork 4
ScoreFlow2
The proper ranking of docking poses is the single most important step in to prioritize compounds in a virtual screening. ScoreFlow is a tool designed to perform rescoring molecular complexes such as protein-ligand, it incorporate routines to perform empirical scoring as well as end-point free energy approaches within the ChemFlow framework. General usage of ChemFlow
ScoreFlow works as a with independent files or as part of the ChemFlow framework. Nonetheless it enforces ChemFlow standards to the results.
ScoreFlow -r receptor.mol2 -l ligands.mol2 -p myproject --center "X" "Y" "Z" --radius "R"
ScoreFlow follows the same logic as DockFlow, it requires receptor, a ligand and parameters for the scoring function. If a project name and protocol are found, ScoreFlow is designed to resume a previous calculation.
Example | Description |
---|---|
-p myproject | A name for the project |
-r myreceptor.mol2 | A .mol2 file containing you receptor(s) |
-l mycompounds.mol2 | A .mol2 file containing you compounds(s) |
--protocol | Name for the current protocol (Optional, default="default") |
ℹ️ The "@<TRIPOS>MOLECULE" field will be used for receptor and compound identification.
ℹ️ To ensure reliability only works with properly created mol2.
Just like DockFlow, rescoring demands some the binding site definitions, and the selected scoring function.
Example | Description |
---|---|
--center 1.0 -3.2 1.5 | The binding site center, in Cartesian coordinates |
--radius 15 | Distance to the binding center (for PLANTS) |
--size 15 15 15 | Length in each cartesian axis (for VINA) |
-sf chemplp | The Scoring function. ( chemplp, plp, plp95, vina or mmgbsa) |
For high throughput execution, ScoreFlow uses all computer cores available. While on a HPC environment ScoreFlow may work through a Workload Manager. SLURM or PBS are implemented. Rescoring with empirical functions such as those implemented in Docking programs is very fast, the parallel execution is only available for MMGBSA.
Example | Description |
---|---|
-w | Workload Manager SLURM or PBS. Default is "None" |
-nc | Number of cores/node. Default is to use all available in the local computer |
-nn | Number of nodes to use within the HPC environment |
7 scoring functions are implemented into ScoreFlow so far. Depending on the choice, supplementary information might be needed.
-
PLANTS and VINA functions: Most scoring functions require a MOL2 file for the receptor. For Vina, a conversion to PDBQT files is done automatically with AutoDock's MGLTools.
Name Program Description plp95 PLANTS Piecewise Linear Potential (PLP) from Gehlhaar DK et al plp PLANTS PLANTS version of the PLP chemplp PLANTS PLANTS version of the PLP implementing some of GOLD's terms vina VINA AutoDock Vina's scoring function -
MM(PB,GB)SA : MM(PB,GB)SA scoring functions run through AmberTools and require a properly generated PDB file for the receptor to be prepared with the Amber 14SB force field. In addition all compounds must have atom types, topology and parameters for the General Amber Force Field - [GAFF]("General Amber Force Field"). LigFlow can manage the conversion with the following options :
-at gaff --amber
.Name Program Description Radii for Gpol SASA for Gnp Surface Tension (γ) Surface Offset (b) PB3 AmberTools MMPBSA, model 3 Parse Molsurf 0.00542 0.920000 GB5 AmberTools MMGBSA, model 5 mbondi2 LCPO 0.00500 0.000000 GB8 AmberTools MMGBSA, model 8 mbondi3 LCPO atom dependant 0.195141 The MM(PB,GB)SA calculations can be run in 2 different fashions :
- based on a single snapshot extracted directly from the docking pose : 1F. An implicit solvent (Generalized Born model) conjugate gradient minimization can be performed prior to the MM(PB,GB)SA calculations, with minab.
- based on a short MD simulation (hundreds of picoseconds of production) in an implicit solvent (Generalized Born model) : MD.
The user must provide masks of atoms to generate files for the complex. Since Amber will ignore chains and renumber the residues accordingly, this step should be done with extra care. The ligand will always be named MOL, and will be the last residue of the complex.
The salt concentration is 0.150 mol.L-1.
Finally, the user can choose how to run the experiment :
- local : run on the current computer in serial,
- parallel : run locally using GNU parallel for a more efficient use of your computer resources,
-
mazinger : run on a compute cluster equipped with PBS.
Once all calculations have been submitted, you have the possibility to kill the ScoreFlow process by pressingCtrl+C
, and resume the process withScoreFlow --resume
.
You can also kill ScoreFlow jobs that are still running withScoreFlow --kill
.
You can modify the current ScoreFlow.config file directly from the command line interface.
To get the available options, run ScoreFlow -h
, and to have a more extensive help, run ScoreFlow -hh
.
Once all of the above requirements are met, you can launch ScoreFlow from the run folder :
ScoreFlow
--purge
flag to delete the previous topology and coordinates files. Alternatively, inside the input_files
directory, rename the com
folder to your preference, as ScoreFlow will search for data inside directories named 'lig' and 'com', and output any structural data to the 'com' folder.
All CSV tables mentioned below are located inside a sub-directory in the rescoring
folder.
- ranking.csv :
Contains the score of all docking poses that were rescored. A decomposition of each energy term is given as well. This table is not sorted.
In case of errors :
- errors.csv :
Contains the name of the docking pose for which an error was produced, the directory containing the pose's file, and the step at which the error was produced. - directories :
When an error is produced, ScoreFlow will keep all files inside a unique sub-folder, identified by the name of the docking pose, and stored inside a directory with the ligand's name. To have a more precise description of the cause of the error, the user should look for files with a .job extension.
- features.csv :
Contains a more detailed energy term decomposition. - protein_bindingsite_fixed.mol2 :
Contains information on the residues present in the binding site (defined by the user in the configuration file). Such file can be used to run a protein-ligand interaction fingerprint analysis with PyPLIF if needed (not included in ChemFlow for now).
- Conversion of the MOL2 files to PDBQT :
The converted files are located insideinput_files/lig/
.
All the topology and coordinates files of the complexes are located in the input_files/com
folder.
A time serie decomposition is also available :
- For calculations on a single snapshot :
- 1F_min.csv :
Contains time serie decomposition of the minimization for every energy term of every docking pose rescored.
- 1F_min.csv :
- For calculations on a short MD simulation :
- MD_min.csv :
Contains time serie decomposition of the minimization for every energy term of every docking pose rescored. - MD_prod.csv :
Contains time serie decomposition of the production for every energy term of every docking pose rescored.
- MD_min.csv :
❓ I did a virtual screening without DockFlow. Can I still use ScoreFlow ?
Yes, but you will need to adapt the structure of your files to the one DockFlow uses :
- Start by creating a directory. It must contain all of the files mentioned below :
- Put all of your docking poses in one (or more) sub-folder.
Each docking pose must follow this scheme :- MOL2 file, with a single molecule per file
- The name of each docking poses should be unique, starting with a common name for different binding modes of the same ligand, directly followed by an underscore "_", followed by any string. The ligand name cannot contain any underscore. Examples :
✅ ligand-1_conf_001.mol2
✅ lig1_index_05_conf_007.mol2
✅ ZINC00967532_conf10.mol2
❌ ZINC_00967532_conf10.mol2
❌ ZINC00967532-conf10.mol2
❌ lig_12_003.mol2
Once these 2 steps are done, you might have to run LigFlow to prepare your docking poses. See LigFlow for more information.
- In ScoreFlow.config :
- Use the "ALL" or "BEST" mode (depending if you used LigFlow).
- Provide a path to the directory created at step "1." as
folder="path/to/directory"
.
❓ I've just ran ScoreFlow on mazinger but now I would like to stop it. Is there a way to do it without killing all my other jobs ?
Run ScoreFlow --kill
to kill the jobs related to the last rescoring you have performed.