Skip to content

System Preparation

David W Wright edited this page May 24, 2018 · 4 revisions

Aims

Take input in the form of a PDB (or mmcif) structure and create a simulation ready system consisting of a forcefield parameterized solute and a environment (solvent, membrane, etc) ready for simulation.

Scope

We aim to be able to take any canonical protein or nucleic acid PDB. A list of the requirements derived from this goal are provided (with example PDBs) here.

We also need to handle non-canonical PDBs without a header. Obviously in many cases this will have to have a warning that the system may be incomplete.

Four scenarios are envisioned that need to be captured:

  1. Structure contains only parameterized residues
  2. Structure contains a target protein/nucleic acid + a non parameterized ligand
  3. Separate structures are provided for target protein/nucleic acid + a non parameterized ligand
  4. Structure provided for target protein/nucleic acid + stucture/parameterized ligand provided separately (mol2)

We will use existing tools (PyRosetta, Modeller) to fill gaps etc. Eventually we may want to use the same architecture to provide homology modelling or basic mutation integration.

Workflow design

Note: Initially assume user will provide topology for ligand or other non-standard residue. Eventually we will automate this where possible.

Stretch Goal: Produce a user readable summary of the system state and work that might be needed after step 4.

  1. Read input coordinates and if available (i.e. from PDB header) information on the 'real system' being modelled
  2. Sanitize subdivisions (chains of a PDB, CHARMM segments)
  • Need to divide into subdivisions containing residues that must be treated as a unit
    • Chains of covalently bonded atoms (or other logically related atoms such as solvent)
    • Note need to be aware many provided PDBs will not provide this information or it will be wrong in chain and segment columns (e.g. AMBER processed files all have one chain).
    • Separate ligands in same chain as target
    • Need to maintain link between header information and the residues of chains it refers to
    • Record unexpected gaps if no header is provided
    • Provide unique residue numbers starting with 1 - to avoid issues with residue insertion (multiple residues with same number) and negative 'pre-sequence' residues.
  1. Sanitize residues
  • Deal with Alt Locs (i.e. multiple residue conformers)
  • Check if available in forcefield or a provided library
  • Check protonation
  1. Match header information with that from coordinates (e.g. check have sequence information to fill gaps)
  2. Create scaffold for parameterization
  • Apply BIOMT
  • Fill gaps
  1. System creation and forcefield parameterization
  • Subdivide structure as needed by preparation code (i.e. separate segments for psfgen/CHARMM)
  • Prepare system environment if necessary (i.e. build membrane)
  • Run forcefield parameterizer/system builder (i.e. Leap or psfgen)
    • Solvate
    • Add ions