Skip to content

System preparation requirements and test PDBs

David W Wright edited this page May 24, 2018 · 1 revision

The goal of the builder is ultimately to be able to take an input PDB and transform it into something simulation ready. First step is to handle the identification of problems.

  • Check residues are available in forcefield
  • Write component PDBs for sections that might require parameterizing etc.
  • Select regions to simulate (includes alternative locations for rotamers etc.)

PDBs downloaded from the Protein Data Bank

Inserted residues

In instances where the numbering in the PDB is designed to match a canonical sequence but there has been an insertion then the added residues are given the last pre-addition number + an insertion ID. We need to 'flatten' this out prior to system preparation for simulation.

Example PDB: 1IGY

Missing residues

The full sequence may be provided but the SEQRES in the header and could thus be modelled in but coordinates are not found for all residues.

Example PDB: 1IGY

Related to both the last two issues is pre-sequence residues - often numbered -N to -1

Alternative residue conformations

Details to be filled in.

Biological assemblies

These are PDBs where the coordinates provided do not (necessarily) reflect the full system of interest. A transform is provided (in BIOMT records) from which the 'missing' parts of the structure can be created from the provided coordinates.

More information is provided here.

Need to be able to:

  • Copy coordinates
  • Apply transform
  • Get new labels for chains/segments
  • Combine initial and transformed coordinates

Example PDB: 1OUT

Oddly organised PDBs

In order to simulate we often need to split into relevant chains. Issues include water having the same chain as proteins and ABA type chain ordering.

Example PDB: 4G9E

Non-standard residues in sequences

Details to be added

Non-canonical PDBs

There are a whole host of issues we may find from PDBs which do not originate in the Protein Data Bank.

Chain labels removed

Often modelling (in AMBER or homollogy modelling software) removes chain labels. We need to be able to guess chain/segment transitions from geometry and or transitions from different residue types (i.e. amino acid to nucleic acid).