Skip to content

Preprocessing

root edited this page Feb 1, 2021 · 3 revisions

The preprocessing steps act as a connection between the given input and the actual pipeline. It includes the following functionalities:

  1. Sanity tests of the input, for example, are the given positions inside the corresponding sequence?
  2. Mapping of protein identifiers. Since StructMAn accepts different types of protein identifiers as input (see), they are internally all mapped to Uniprot-AC identifiers and identical proteins given with different types of identifiers are grouped together.
  3. The amino acid sequences of the given proteins are collected from Uniprot.
  4. Bigger inputs are dissected into chunks that are processed in a serial fashion in order to control the memory consumption of the pipeline.

The sequences obtained from step three are used in the following structure search.

Clone this wiki locally