System Preparation

Aims

Take input in the form of a PDB (or mmcif) structure and create a simulation ready system consisting of a forcefield parameterized solute and a environment (solvent, membrane, etc) ready for simulation.

Scope

We aim to be able to take any canonical protein or nucleic acid PDB. A list of the requirements derived from this goal are provided (with example PDBs) here.

We also need to handle non-canonical PDBs without a header. Obviously in many cases this will have to have a warning that the system may be incomplete.

Four scenarios are envisioned that need to be captured:

Structure contains only parameterized residues
Structure contains a target protein/nucleic acid + a non parameterized ligand
Separate structures are provided for target protein/nucleic acid + a non parameterized ligand
Structure provided for target protein/nucleic acid + stucture/parameterized ligand provided separately (mol2)

We will use existing tools (PyRosetta, Modeller) to fill gaps etc. Eventually we may want to use the same architecture to provide homology modelling or basic mutation integration.

Workflow design

Note: Initially assume user will provide topology for ligand or other non-standard residue. Eventually we will automate this where possible.

Stretch Goal: Produce a user readable summary of the system state and work that might be needed after step 4.

Read input coordinates and if available (i.e. from PDB header) information on the 'real system' being modelled
Sanitize subdivisions (chains of a PDB, CHARMM segments)

Need to divide into subdivisions containing residues that must be treated as a unit
- Chains of covalently bonded atoms (or other logically related atoms such as solvent)
- Note need to be aware many provided PDBs will not provide this information or it will be wrong in chain and segment columns (e.g. AMBER processed files all have one chain).
- Separate ligands in same chain as target
- Need to maintain link between header information and the residues of chains it refers to
- Record unexpected gaps if no header is provided
- Provide unique residue numbers starting with 1 - to avoid issues with residue insertion (multiple residues with same number) and negative 'pre-sequence' residues.

Sanitize residues

Deal with Alt Locs (i.e. multiple residue conformers)
Check if available in forcefield or a provided library
Check protonation

Match header information with that from coordinates (e.g. check have sequence information to fill gaps)
Create scaffold for parameterization

Apply BIOMT
Fill gaps

System creation and forcefield parameterization

Subdivide structure as needed by preparation code (i.e. separate segments for psfgen/CHARMM)
Prepare system environment if necessary (i.e. build membrane)
Run forcefield parameterizer/system builder (i.e. Leap or psfgen)
- Solvate
- Add ions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Preparation

Aims

Scope

Workflow design

Clone this wiki locally