Skip to content

ScoreFlow2

DIEGO ENRY GOMES edited this page May 29, 2018 · 3 revisions

ScoreFlow overview

The proper ranking of docking poses is the single most important step in to prioritize compounds in a virtual screening. ScoreFlow is a tool designed to perform rescoring molecular complexes such as protein-ligand, it incorporate routines to perform empirical scoring as well as end-point free energy approaches within the ChemFlow framework. General usage of ChemFlow

ScoreFlow works as a with independent files or as part of the ChemFlow framework. Nonetheless it enforces ChemFlow standards to the results.

Usage

ScoreFlow -r receptor.mol2 -l ligands.mol2 -p myproject --center "X" "Y" "Z" --radius "R"

Basic input parameters:

ScoreFlow follows the same logic as DockFlow, it requires receptor, a ligand and parameters for the scoring function. If a project name and protocol are found, ScoreFlow is designed to resume a previous calculation.

Example Description
-p myproject A name for the project
-r myreceptor.mol2 A .mol2 file containing you receptor(s)
-l mycompounds.mol2 A .mol2 file containing you compounds(s)
--protocol Name for the current protocol (Optional, default="default")

ℹ️ The "@<TRIPOS>MOLECULE" field will be used for receptor and compound identification.
ℹ️ To ensure reliability only works with properly created mol2.

Scoring function parameters

Just like DockFlow, rescoring demands some the binding site definitions, and the selected scoring function.

Example Description
--center 1.0 -3.2 1.5 The binding site center, in Cartesian coordinates
--radius 15 Distance to the binding center (for PLANTS)
--size 15 15 15 Length in each cartesian axis (for VINA)
-sf chemplp The Scoring function. ( chemplp, plp, plp95, vina or mmgbsa)

For high throughput execution, ScoreFlow uses all computer cores available. While on a HPC environment ScoreFlow may work through a Workload Manager. SLURM or PBS are implemented. Rescoring with empirical functions such as those implemented in Docking programs is very fast, the parallel execution is only available for MMGBSA.

Example Description
-w Workload Manager SLURM or PBS. Default is "None"
-nc Number of cores/node. Default is to use all available in the local computer
-nn Number of nodes to use within the HPC environment

Scoring functions available

7 scoring functions are implemented into ScoreFlow so far. Depending on the choice, supplementary information might be needed.

  • PLANTS and VINA functions: Most scoring functions require a MOL2 file for the receptor. For Vina, a conversion to PDBQT files is done automatically with AutoDock's MGLTools.

    Name Program Description
    plp95 PLANTS Piecewise Linear Potential (PLP) from Gehlhaar DK et al
    plp PLANTS PLANTS version of the PLP
    chemplp PLANTS PLANTS version of the PLP implementing some of GOLD's terms
    vina VINA AutoDock Vina's scoring function
  • MM(PB,GB)SA : MM(PB,GB)SA scoring functions run through AmberTools and require a properly generated PDB file for the receptor to be prepared with the Amber 14SB force field. In addition all compounds must have atom types, topology and parameters for the General Amber Force Field - GAFF. In addition, calculation of MM/GBSA requires AM1-BCC or ideally RESP charges for better results. Therefore, prior to MM/GBSA calculations one must LigFlow must be used to prepare the ligands:

LigFlow -p myproject -l ligands.mol2 [-charges resp] [-w SLURM]
Name Program Description Radii for Gpol SASA for Gnp Surface Tension (γ) Surface Offset (b)
PB3 AmberTools MMPBSA, model 3 Parse Molsurf 0.00542 0.920000
GB5 AmberTools MMGBSA, model 5 mbondi2 LCPO 0.00500 0.000000
GB8 AmberTools MMGBSA, model 8 mbondi3 LCPO atom dependant 0.195141

The MM(PB,GB)SA calculations can be run in 2 different fashions :

  • based on a single snapshot extracted directly from the docking pose : 1F. An implicit solvent (Generalized Born model) conjugate gradient minimization can be performed prior to the MM(PB,GB)SA calculations, with minab.
  • Based on a short MD simulation (hundreds of picoseconds of production) in an implicit solvent (Generalized Born model) : MD.

The user must provide masks of atoms to generate files for the complex. Since Amber will ignore chains and renumber the residues accordingly, this step should be done with extra care. The ligand will always be named MOL, and will be the last residue of the complex.

The salt concentration is 0.150 mol.L-1.

Resources for the calculations :

Finally, the user can choose how to run the experiment :

  • local : run on the current computer in serial,
  • parallel : run locally using GNU parallel for a more efficient use of your computer resources,
  • mazinger : run on a compute cluster equipped with PBS.
    Once all calculations have been submitted, you have the possibility to kill the ScoreFlow process by pressing Ctrl+C, and resume the process with ScoreFlow --resume.
    You can also kill ScoreFlow jobs that are still running with ScoreFlow --kill.

You can modify the current ScoreFlow.config file directly from the command line interface.
To get the available options, run ScoreFlow -h, and to have a more extensive help, run ScoreFlow -hh.

Running

Once all of the above requirements are met, you can launch ScoreFlow from the run folder :

ScoreFlow

⚠️ If you plan on comparing several MM-PBSA functions/models, run ScoreFlow with the --purge flag to delete the previous topology and coordinates files. Alternatively, inside the input_files directory, rename the com folder to your preference, as ScoreFlow will search for data inside directories named 'lig' and 'com', and output any structural data to the 'com' folder.

Results

Common to all scoring functions

All CSV tables mentioned below are located inside a sub-directory in the rescoring folder.

  • ranking.csv :
    Contains the score of all docking poses that were rescored. A decomposition of each energy term is given as well. This table is not sorted.

In case of errors :

  • errors.csv :
    Contains the name of the docking pose for which an error was produced, the directory containing the pose's file, and the step at which the error was produced.
  • directories :
    When an error is produced, ScoreFlow will keep all files inside a unique sub-folder, identified by the name of the docking pose, and stored inside a directory with the ligand's name. To have a more precise description of the cause of the error, the user should look for files with a .job extension.

PLANTS

  • features.csv :
    Contains a more detailed energy term decomposition.
  • protein_bindingsite_fixed.mol2 :
    Contains information on the residues present in the binding site (defined by the user in the configuration file). Such file can be used to run a protein-ligand interaction fingerprint analysis with PyPLIF if needed (not included in ChemFlow for now).

VINA

  • Conversion of the MOL2 files to PDBQT :
    The converted files are located inside input_files/lig/.

MM(PB,GB)SA

All the topology and coordinates files of the complexes are located in the input_files/com folder.
A time serie decomposition is also available :

  • For calculations on a single snapshot :
    • 1F_min.csv :
      Contains time serie decomposition of the minimization for every energy term of every docking pose rescored.
  • For calculations on a short MD simulation :
    • MD_min.csv :
      Contains time serie decomposition of the minimization for every energy term of every docking pose rescored.
    • MD_prod.csv :
      Contains time serie decomposition of the production for every energy term of every docking pose rescored.

FAQ

I did a virtual screening without DockFlow. Can I still use ScoreFlow ?

Yes, but you will need to adapt the structure of your files to the one DockFlow uses :

  1. Start by creating a directory. It must contain all of the files mentioned below :
  2. Put all of your docking poses in one (or more) sub-folder.
    Each docking pose must follow this scheme :
    • MOL2 file, with a single molecule per file
    • The name of each docking poses should be unique, starting with a common name for different binding modes of the same ligand, directly followed by an underscore "_", followed by any string. The ligand name cannot contain any underscore. Examples :
      ✅ ligand-1_conf_001.mol2
      ✅ lig1_index_05_conf_007.mol2
      ✅ ZINC00967532_conf10.mol2
      ❌ ZINC_00967532_conf10.mol2
      ❌ ZINC00967532-conf10.mol2
      ❌ lig_12_003.mol2

Once these 2 steps are done, you might have to run LigFlow to prepare your docking poses. See LigFlow for more information.

  1. In ScoreFlow.config :
    • Use the "ALL" or "BEST" mode (depending if you used LigFlow).
    • Provide a path to the directory created at step "1." as folder="path/to/directory".

I've just ran ScoreFlow on mazinger but now I would like to stop it. Is there a way to do it without killing all my other jobs ?
Run ScoreFlow --kill to kill the jobs related to the last rescoring you have performed.