-
Notifications
You must be signed in to change notification settings - Fork 0
Preprocessing
root edited this page Feb 1, 2021
·
3 revisions
The preprocessing steps act as a connection between the given input and the actual pipeline. It includes the following functionalities:
- Sanity tests of the input, for example, are the given positions inside the corresponding sequence?
- Mapping of protein identifiers. Since StructMAn accepts different types of protein identifiers as input (see), they are internally all mapped to Uniprot-AC identifiers and identical proteins given with different types of identifiers are grouped together.
- The amino acid sequences of the given proteins are collected from Uniprot.
- Bigger inputs are dissected into chunks that are processed in a serial fashion in order to control the memory consumption of the pipeline.
The sequences obtained from step three are used in the following structure search.