EvoDOCK is a software for Heterodimeric and Symmetric Protein-Protein docking.
Heterodimeric docking has been published at: A memetic algorithm enables global all-atom protein-protein docking with sidechain flexibility
Symmetric docking has been published (as preprint) at: Accurate prediction of protein assembly structure by combining AlphaFold and symmetrical docking
EvoDOCK only requires a standard computer with linux installed (MacOS will be supported soon). High RAM is needed for large populations (>= 100).
This package is supported for Linux/macOS. The package has been tested on the following systems:
Linux: Ubuntu 20.04.5-6 and CentOS Linux 7
If using heterodimeric only the following packages must be installed:
- Python-3.6 or later (PyRosetta dependency).
- PyRosetta (http://www.pyrosetta.org) (Can be installed with Anaconda). You need to obtain a license before use (see the link)
If using Symmetric Protein-Protein docking these additional packages must be installed:
- MAFFT (https://mafft.cbrc.jp/alignment/software/) (can be installed with Anaconda/brew/apt)
- mpi4py and its requirements (https://mpi4py.readthedocs.io/en/stable/install.html) (can be install with Anaconda/pip)
Furthermore the following packages are also needed but are automatically installed by pip using the setup.py script (see Installation Guide).
cubicsym @ git+https://github.com/Andre-lab/cubicsym,
cloudcontactscore @ git+https://github.com/Andre-lab/cloudcontactscore,
numpy>=1.21.0
pandas>=1.3.4
pillow>=9.1.0
scipy>=1.7.1
seaborn>=0.11.2
setuptools>=44.0.0
imageio>=2.10.1
matplotlib>=3.4.3
Clone the cubicsym repository and cd
into it. Then run the install script.
git clone https://github.com/Andre-lab/evodock.git
cd ./evodock
pip install .
Then additionally install the packages under Package requirements!
Installation time should only take a couple of seconds but downloading the required pacakges and installing them can take several minutes.
Before running EvoDOCK it is important to prepack the input files as so (takes several seconds):
python ./scripts/prepacking.py --file <input_file>
scripts/af_to_evodock.py
converts AlphaFold 2 and AlphaFold-Multimer predictions to an EvoDOCK ensemble.
The script is well documented. Use python scripts/af_to_evodock.py -h
to see more. The output will already be prepacked.
Below, 2 examples of running the script for creating an ensemble for Local docking or Global assembly docking is given. You need to download af_data.tar.gz
here. Unzip it with
tar -xf af_data.tar.gz
Put the AF_data in evodock/inputs
before running the tests below.
Preparing an ensemble for Local docking (takes a few minutes):
scripts/af_to_evodock.py --path inputs/AF_data/local --symmetry O --ensemble Local --out_dir tests/outputs/ --max_multimers 5 --max_monomers 5 --modify_rmsd_to_reach_min_models 50 --max_total_models 5 --align_structure inputs/input_pdb/3N1I/3N1I.cif
Preparing an ensemble for Global assembly docking (takes a few minutes):
scripts/af_to_evodock.py --path inputs/AF_data/globalfrommultimer --symmetry T --ensemble GlobalFromMultimer --out_dir tests/outputs/ --max_multimers 5 --max_monomers 5 --modify_rmsd_to_reach_min_models 50 --max_total_models 5
EvoDOCK can be run with different configurations given a specifc config.ini input file as so:
python evodock.py configs.ini
The following sections describe how to configure EvoDOCK through the config file. These options are available:
- [Docking]
- [Input]
- [Outputs]
- [DE]
- [Flexbb]
- [Bounds]
- [Pymol]
- [RosettaOptions]
- [Native]
Examples of config files for different EvoDOCK configurations are found in the config
folder with the following behavior:
- Heteromeric docking with single ligand and receptor backone (takes a few minutes):
configs/heterodimeric/sample_dock_single.ini
- Heteromeric docking with flexible backbones (takes a few minutes):
configs/heterodimeric/sample_dock_flexbb.ini
- Local recapitulation with a single backbone (takes a few minutes):
configs/symmetric/local_recapitulation.ini
- Local docking with flexible backbones (takes a few minutes):
configs/symmetric/local_assembly.ini
- Global assembly docking with flexible backbones (takes a few minutes):
configs/symmetric/global_assembly.ini
Specifies the type of docking protocol used of which there are 3 options:
Local
For heterodimeric local docking AND symmetric Local dockingGlobal
For heterodimeric global dockingGlobalFromMultimer
For symmetric Global assembly docking
[Docking]
type=<Local/Global/GlobalFromMultimer>
Specifies the input type. For heteromeric docking you need to specify the either
single or ligands
AND receptors
for docking either 2 single backbones or 2 sets of multiple backbones. For heterodimeric
docking a template
can be supplied. This is used to extact rotamers and to initially align the receptor and ligand onto.
[Input]
single=<path to a pdb file containing containg the heterodimer (2 chains)>
or
[Input]
ligands=<path to ligands>
receptors=<path to receptors>
or
[Input]
template=<path to a pdb file to serve as a template>
ligands=<path to ligands>
receptors=<path to receptors>
For symmetric docking you need to specify the symdef_file
and either single
or subunits
for docking either a single or multiple backbones
[Input]
single=<path to single pdb file>
symdef_file=<path to the symdef file>
or
[Input]
subunits=<path to a directory containing all of the subunits>
symdef_file=<path to the symdef file>
Output options for the results:
output_path
Directory in which to output all files.output_pdb
Output pdb.output_pdb_per_generation
Output the best pdb for each generation.n_models
How many models to output in the end.clutser
To cluster the results before outputting.
[Outputs]
output_path=<path to the output directory>
output_pdb=<boolean>
output_pdb_per_generation=<boolean>
n_models=<int>
cluster=<boolean>
Differential Evolution options:
scheme
: The selection strategy for the base vector at mutation operation. 1. Selection randomly (=RANDOM, default), 2. Select the best (=BEST)popsize
: The size of the population. Default is 100.mutate
: mutation rate (weight factor F). Must be between 0 and 1.0. Default is 0.1.recombination
: crossover probability (CR). Must be between 0 and 1.0. Default is 0.7.maxiter
: Generations to perform. Default is 50local_search
: The local search docking scheme. For heteromeric docking use [None, only_slide, mcm_rosetta] for symmetryic docking use symshapedock. Default for heterodimeric docking is mcm_rosetta and for symmetryic docking symshapedock.slide
: Use sliding or not. Default is True.selection
: The energy type for use when selecting. 1. Select by interface (=interface, default for symmetric docking), 2. select by total energy (=total, default for heterodimeric docking)
[DE]
scheme=<RANDOM/BEST>
popsize=<integer>
mutate=<float>
recombination=<float>
maxiter=<integer>
local_search=<None/only_slide/mcm_rosetta/symshapedock>
slide=<boolean>
selection=<interface/total>
If this section is present EvoDOCK will do flexible backbone docking. 2 options can be set:
swap_prob
The probability of doing a backbone trial. Must be in the interval: [0, 1.0]low_memory_mode
Will save memory by only loading in 1 backbone at the time at the cost of some computional time. Is only available for symmetrical docking and is highly recommend when using symmetrical docking.
[Flexbb]
swap_prob=<float>
low_memory_mode=<boolean>
Set options for the bounds of the rigid body parameters when doing symmetrical docking:
init
: The initial bounds the rigid body parameters are sampled in; [z, λ, x, ψ, ϴ, φ] for cubic symmetric docking.bounds:
: The maximum bounds the rigid body parameters are sampled in; [z, λ, x, ψ, ϴ, φ] for cubic symmetric docking.init_input_fix_percent
: The percent chance of keeping an individual to its exact input values and not randomizing inside the init bounds. Should be between 0 and 100.allow_flip
: allow the individual to flip 180 degrees.xtrans_file
: The path to the file containing the x translations for each subunit. This file is output from af_to_evodock.py when running with --ensemble=GlobalFromMultimer
[Bounds]
init=<initial bounds, example: 0,60,5,40,40,40>
bounds=<maximum bounds, example: 1000,60,5,180,40,40>
init_input_fix_percent=<float>
allow_flip=<boolean>
xtrans_file=<path to the xtrans_file>
EvoDOCK can be run with PyMOL as described in https://www.rosettacommons.org/docs/latest/rosetta_basics/PyMOL. This sets options for PyMOL:
on
: Use PyMOL.history
: Turn history on.show_local_search
: Show the processes in during local search.ipaddress
: The IP address to use.
[Pymol]
on=<boolean>
history=<boolean>
show_local_search=<boolean>
ipaddress=<IP address>
Rosetta flags to use. Any can be specified.
[RosettaOptions]
initialize_rigid_body_dofs=<boolean>
Calculates metrics againts the native structure (RMSD for instance). There are 3 input types:
crystallic_native
The native structuresymmetric_input
The symmetric input file of the native structuresymdef_file
The symdef file for the native structurelower_diversity_limit
The lowest RMSD limit the structures should have to their native structure
2 and 3 is required for symmetric docking.
[Native]
crystallic_input=<path to native structure>
symmetric_input=<path to the symmetric input>
symdef_file=<path to the input file>
lower_diversity_limit=<float>
EvoDOCK produces several different log files:
-
evolution.csv
is a general summary of the evolutionary process across the entire population. Per generation (gen) it logs:- The average energy of the population (avg)
- The lowest energy of population (best)
- The RMSD of the best individual with the lowest energy (rmsd_from_best) if running with a native structure.
-
popul.csv
is a general summary of the evolutionary process for each individual in the population. Per generation (gen) it logs:- The current score (sc_*)
- The current rmsd (rmsd_*) if running with a native structure.
- The current interface score (Isc_*)
- The current Interface rmsd (Irmsd_*) if running with a native structure.
-
trials.csv
is the equivalent file to popul.csv, but it reports the trials (candidates) generated during the each generation. This can be practically useful in case that you want to check if the DE+MC is creating proper candidates that can contribute to the evolution. -
time.csv
is the computational time (in seconds) for each generation (gen). -
all_individual.csv
contains, for each generation (gen), the best genotype (rigid body degrees of freedom) of all individuals. -
best_individual.csv
contains, for each generation (gen), the best genotype (rigid body degrees of freedom) of the individual with lowest energy value. -
population_swap.csv
contains, for each generation (gen), the backbone swap success. -
flip_fix.csv
list for each individual if they were initially flipped or fixed. Is useful when running running withGlobalFromMultimer
. -
ensemble.csv
contains, for each generation (gen), the name of file that is used as the current backbone for each individual.
EvoDOCK also outputs structure files at the in a folder called structures
(see [Outputs] for more options). An option can also be set (see [Outputs]) to output the lowest energy structure for each geneation (evolved.pdb
) during runtime.
The script scripts/symmetric_relax.py
can be used to relax structures from the EvoDOCK output. The script is well documented: use python scripts/symmetric_relax.py -h
to see more.
It is advisable to use this script when using AlphaFold ensemble models, compared to the vanilla Rosettas relax protocol, as it guards against the structures blowing up if the AlphaFold structures have bad energies.
When modelling symmetrical structures in EvoDOCK, it outputs 3 types of outputs:
- The full structure (suffix: _full.cif)
- A symmetry file (suffix: .symm)
- Input file (suffix: _INPUT.pdb).
Use the symmetry file and the input file with symmetric_relax.py
.
A test can be run with (can take up to an hour or more):
python scripts/symmetric_relax.py --file inputs/input_pdb/2CC9/2CC9_tobe_relaxed.pdb --symmetry_file inputs/symmetry_files/2CC9_tobe_relaxed.symm --rosetta_out tests/outputs/ --input_out tests/outputs/
Differential Evolution [Price97] is a population-based search method. DE creates new candidate solutions by combining existing ones according to a simple formula of vector crossover and mutation, and then keeping whichever candidate solution has the best score or fitness on the optimization problem at hand.
- Storn, R., Price, K. Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization 11, 341–359 (1997). https://doi.org/10.1023/A:1008202821328