-
Notifications
You must be signed in to change notification settings - Fork 0
Feature Table
AlexanderGress edited this page Nov 3, 2020
·
7 revisions
The features table is an additional file StructMAn produces. It is a tab-separated table file. Each row represents the computed results around one amino acid position of one queried protein. Compared to the classification table it contains more specific results and values. The results listed in this table can be used to assist machine learning methods that focus on amino acids or point mutations. It can list results specific for queried amino acid positions as well as for queried point mutations. In the following, all its columns are explained:
Name | Description |
---|---|
Protein (Uniprot-Ac or PDB-Id:Chain-Id) | The ID of the queried protein that contains the corresponding position. |
WT Amino Acid | The one-letter amino acid type of the wildtype version of the query. |
Position | The position of the queried mutation in the sequence of the query. |
Mut Amino Acid | The one-letter amino acid type of the mutated version of the query. |
AA change | A combination of WT Amino Acid, Position, and Mut Amino Acid. |
Tags | The tags given in by the input of the query. When doing supervised machine learning this can be used to add the target value. |
Distance-based classification | The classification based on euclidean distance calculations. |
Distance-based simple classification | A simplified version of the distance-based classification. |
RIN-based classification | The classification based on residue interaction networks. |
RIN-based simple classification | A simplified version of the RIN-based classification. |
Classification confidence | A confidence value for the classification based on how many structures went into the classification, the overall quality of these structures, and the consistency of the information from different structures. |
Structure location | The location of the queried position. Either on the Surface of the protein or in the Core of the protein. This is an aggregated result from analyzing the solvent access of all mapped residues. |
Amount of mapped structures | The number of structures the queried position could be mapped to. |
Secondary structure assignment | The aggregated secondary structure assignment obtained by a majority vote of the secondary structure assignments done by DSSP of all mapped residues. |
IUPred value | Aggregated disorder score by IUpred2a. |
Region structure type | Aggregated structure type: disordered region or globular region. |
Modres score | Aggregated score for the tendency of the queried position to get post-translationally modified. |
Modres probability | Propensity of all mapped residues being post-translationally modified. |
Phi | Aggregated phi angle. |
Psi | Aggregated psi angle. |
KD mean | The difference in Kyte-Doolittle (KD) hydropathy score of the wildtype residue and the mutated residue. |
Volume mean | The difference in van-der-Waals volume of the wildtype residue and the mutated residue. |
Chemical distance | Value of substitution in the chemical distance substitution matrix based on . |
Blosum62 | Value of substitution in the Blosum62 substitution matrix. |
Aliphatic change | Boolean denoting a change in the aliphatic class of the substitution. |
Hydrophobic change | Boolean denoting a change in the hydrophobic class of the substitution. |
Aromatic change | Boolean denoting a change in the aromatic class of the substitution. |
Positive charged change | Boolean denoting a change in the positive charged class of the substitution. |
Polar change | Boolean denoting a change in the polar class of the substitution. |
Negative charge change | Boolean denoting a change in the negative charge class of the substitution. |
Charged change | Boolean denoting a change in the charged class of the substitution. |
Small change | Boolean denoting a change in the small class of the substitution. |
Tiny change | Boolean denoting a change in the tiny class of the substitution. |
Total change | The sum of all class changes of the substitution. |
B Factor | Aggregated b factor value. |
AbsoluteCentrality | Aggregated network centrality value of all mapped residues. The centrality values are calculated from the residue interaction network of the chain of the mapped residue isolated from possible other chains given in the structure. |
LengthNormalizedCentrality | Aggregated length normalized centrality value of all mapped residues. The centrality values are normalized by the size of the chain of the mapped residue. |
MinMaxNormalizedCentrality | Aggregated min-max-normalized centrality value of all mapped residues. The centrality values are normalized by a scale based on the maximal and minimal network centrality values of all residues of the chain of the mapped residue. |
AbsoluteCentralityWithNegative | Same as AbsoluteCentrality, but the residue interaction networks include negative edges. |
LengthNormalizedCentralityWithNegative | Same as LengthNormalizedCentrality, but the residue interaction networks include negative edges. |
MinMaxNormalizedCentralityWithNegative | Same as MinMaxNormalizedCentrality, but the residue interaction networks include negative edges. |
AbsoluteComplexCentrality | Aggregated network centrality value of all mapped residues. The centrality values are calculated from the residue interaction network of all chains given in the structure. |
LengthNormalizedComplexCentrality | Aggregated length normalized centrality value of all mapped residues. The centrality values are calculated from the residue interaction network of all chains given in the structure. |
MinMaxNormalizedComplexCentrality | Aggregated min-max-normalized centrality value of all mapped residues. The centrality values are calculated from the residue interaction network of all chains given in the structure. |
AbsoluteComplexCentralityWithNegative | Same as AbsoluteComplexCentrality, but the residue interaction networks include negative edges. |
LengthNormalizedComplexCentralityWithNegative | Same as LengthNormalizedComplexCentrality, but the residue interaction networks include negative edges. |
MinMaxNormalizedComplexCentralityWithNegative | Same as MinMaxNormalizedComplexCentrality, but the residue interaction networks include negative edges. |
Intra_SSBOND_Propensity | Propensity of mapped residue forming a cysteine-cysteine bond with a cysteine from the same chain. |
Inter_SSBOND_Propensity | Propensity of mapped residue forming a cysteine-cysteine bond with a cysteine from another chain. |
Intra_Link_Propensity | Propensity of mapped residue forming a covalent bond with a residue from the same chain. |
Inter_Link_Propensity | Propensity of mapped residue forming a covalent bond with a residue from another chain. |
CIS_Conformation_Propensity | Propensity of mapped residue having a peptide bond in cis conformation to the next residue. |
CIS_Follower_Propensity | Propensity of mapped residue having a peptide bond in cis conformation to the previous residue. |
Inter Chain Median KD | Aggregated median hydropathy value of all residues of the same chain closer than 10 angstroms of the mapped residue. |
Inter Chain Distance Weighted KD | Aggregated distance weighted hydropathy value of all residues of the same chain closer than 10 angstroms of the mapped residue. Distance weighted means that the hydropathy values got aggregated based on the distance to the mapped residue. |
Inter Chain Median RSA | Aggregated median relative solvent-accessible area of all residues of the same chain closer than 10 angstroms of the mapped residue. |
Inter Chain Distance Weighted RSA | Aggregated distance weighted relative solvent-accessible area of all residues of the same chain closer than 10 angstroms of the mapped residue. Distance weighted means that the RSA values got aggregated based on the distance to the mapped residue. |
Intra Chain Median KD | Aggregated median hydropathy value of all residues of another chain closer than 10 angstroms of the mapped residue. |
Intra Chain Distance Weighted KD | Aggregated distance weighted hydropathy value of all residues of another chain closer than 10 angstroms of the mapped residue. |
Intra Chain Median RSA | Aggregated median relative solvent-accessible area of all residues of another chain closer than 10 angstroms of the mapped residue. |
Intra Chain Distance Weighted RSA | Aggregated distance weighted relative solvent-accessible area of all residues of another chain closer than 10 angstroms of the mapped residue. |
[neighbor, short, long, ligand, ion, metal, Protein, DNA, RNA, Peptide] score | Aggregated sum of interaction scores over all edges of the mapped residue in the residue interaction network to specific interaction partners. Neighbor: both neighboring residues connected by the main chain. Short: non-neighbors that are closer than 6 positions in the sequence of the protein. Long: All residues that are not neighbors or short of the same chain. Ligand: any low-molecular-weight molecule in the structure. Ion: any non-metal ion. Metal: any metal ion. Protein: Any residue from another chain in the structure. DNA: any nucleic acid from a DNA chain in the structure. RNA: any nucleic acid from a RNA chain in the structure. Peptide: any residue from a non-protein peptide in the structure. |
[neighbor, short, long, ligand, ion, metal, Protein, DNA, RNA, Peptide] degree | Aggregated number of edges of the mapped residue in the residue interaction network to specific interaction partners. |
[neighbor, short, long, ligand, ion, metal, Protein, DNA, RNA, Peptide] H-bond score | Aggregated sum of H-bond scores over all edges of the mapped residue in the residue interaction network to specific interaction partners. |