Skip to content

Setting parameters for representation optimization: best practices

shruthivis edited this page Nov 4, 2018 · 2 revisions

The parameters to be set for the representation optimizer are:

Set of bead sizes for incremental coarse-graining

The number of coarse-graining iterations is based on the desired speed of convergence. The bead sizes in consecutive iterations can be tens of residues apart, because the bead size increases sub-linearly with the number of residues. Further, the maximum bead size depends on the predicted protein shape (e.g., extended helices cannot be represented accurately by largespherical beads) and the scoring functions used (not all scoring functions are compatible with coarse-grained primitives).

Time for sampling models of intermediate representations

The time taken for sampling models of intermediate representations is based on whether a sufficient number of good-scoring models can be obtained at intermediate representations. For intermediate representations in examples shown in the 2018 PNAS paper, we used half the number of steps and half the number of independent runs that we used for full sampling.

Criteria for selecting good-scoring models in intermediate representations

The criteria for choosing good-scoring models should result in a sufficient number of good-scoring models to estimate the sampling precision (atleast 1000). If a sufficient number of good-scoring models is not obtained, more sampling is needed, or the criteria for good-scoring models needs to be relaxed.

Tolerance c for defining the relationship between representation and sampling precisions

The representation and sampling precisions should ideally be equal. We use the tolerance parameter c (set to around 15 Å in the benchmarks) to allow for uncertainty in the estimate of the sampling precision, arising from the grid size and stochastic sampling.

In general, c should be as low as possible, but should be robust/large enough to:

  • make the top 10% of beads with the highest data density precise in the first (highest resolution representation) iteration.
  • make sure most of the beads previously marked precise are still precise in subsequent iterations.

Note that if you sample more or relax the criteria for selecting good-scoring models, you should correspondingly relax c.

Grid size for estimating bead-wise sampling precision

The grid size for estimating bead-wise sampling precision is 2-3 Å (the radius of a single residue-level bead).

Setting move sizes for different resolution beads

The Monte Carlo move sizes for different resolution beads can be obtained by testing the average acceptance rate (of floppy beads) in short simulations in which the system is coarse-grained uniformly to the required resolution.

Alternatively, an easier approximation is to add about 1 Å to the maximum translation for every increase in size of ~10-20 residues.