You can find a detailed description of the tables of the EvoNAPS database in the following chapters:
- dataorigin
- aa_models
- dna_models
- aa_alignments
- dna_alignments
- aa_sequences
- dna_sequences
- aa_modelparameters
- dna_modelparameters
- aa_trees
- dna_trees
- aa_branches
- dna_branches
Note that, sometimes the tables containing information regarding DNA and protein alignments are identical (e.g., alignments tables). However, in some tables there are differences in the number and kind of columns (e.g., sequences, modelparameters, trees tables).
- PK: primary key
- NN: not null
- UQ: unique key
- AI: auto-incremented
- +: applies to
- *: is involved in
Comment: The dataorigin table holds information regarding the original sources of the alignments in the EvoNAPS database.
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
DATABASE_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
DATABASE_ID | varchar(100) | + | + | This field holds the name of the source database, which in turn serves as the ID of said database. The entries of this column must be unique. | |||
DOI | varchar(100) | NULL | States the DOI of the paper describing the source database, should there exist one. | ||||
PUBMED_ID | varchar(100) | NULL | States the PUBMED-ID of the paper describing the source database, should there exist one. | ||||
LAST_UPDATED | varchar(100) | NULL | States the date the source database was last updated, if available. | ||||
SEQ_TYPE | varchar(100) | NULL | States whether the source database holds of DNA and/or protein alignments. | ||||
DESCRIPTION | text | NULL | A text field that gives a short description of the source database. | ||||
SIZE | text | NULL | States the number of alignments the source database holds. | ||||
COMMENT | text | NULL | An optional text field for any comments regarding the source database. |
Comment: The aa_models table lists the different protein substitution rate matrices that were tested in the EvoNAPS workflow and includes the assumed amino acid frequencies and substitution rates.
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
MODEL_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
MODEL_NAME | varchar(100) | + | + | Name of the protein model (substitution rate matrix). The name must be unique. | |||
REGION | varchar(50) | NULL | States the region of the cell where the proteins from which the substitution rate matrix was derived from are abundant. Optional, default is NULL. | ||||
EXPLANATION | varchar(100) | NULL | This field contains a short description of the model. | ||||
STAT_DIS_TYPE | varchar(50) | This field states whether the state frequencies of the stationary distribution assumed in the model are empirical (counted frequencies from the alignment) or if they are predefined by the model. | |||||
FREQ_A | decimal(10,9) | NULL | Either NULL if STAT_DIS_TYPE='empirical'. Else, the frequency of the amino acid alanine (A) assumed by the model. | ||||
… | … | ||||||
FREQ_V | decimal(10,9) | NULL | Either NULL if STAT_DIS_TYPE='empirical'. Else, the frequency of the amino acid tyrosine (Y) assumed by the model. | ||||
RATE_AR | DECIMAL(15,9) | + | The substitution rate from aa A to aa R assumed by the model. | ||||
… | … | ||||||
RATE_YV | DECIMAL(15,9) | + | The substitution rate from aa Y to aa V assumed by the model. |
Comment: The dna_models table lists the different DNA substitution rate matrices that were tested in the EvoNAPS workflow.
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
MODEL_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
MODEL_NAME | varchar(100) | + | Name of the DNA model (substitution rate matrix). The name must be unique. | ||||
FREE_PARAMETERS | int(11) | + | States the number of free parameters of the model. | ||||
BASE_FREQUENCIES | varchar(30) | + | States whether the assumed base frequencies of the model are uniform (0.25 for each base) or unequal. | ||||
SUBSTITUTION_RATES | varchar(100) | + | States (possible) restrictions the model has on the substitution rates. | ||||
EXPLANATION | varchar(100) | NULL | This field contains a short description of the model. | ||||
SUBSTITUTION_CODE | varchar(100) | + | This field shows the substitution code of the rate matrix. |
Comment: The aa_alignments table holds general information and characteristics regarding each protein alignment in the database.
Constraints:
- FOREIGN KEY (FROM_DATABASE) REFERENCES dataorigin (DATABASE_ID)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
ALI_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | + | Name of the alignment (alignment ID). Must be unique. | |||
FROM_DATABASE | varchar(100) | + | States from which original database the alignemnt stems from (e.g. PANDIT). Serves as foreign key to connect to the dataorigin table. | ||||
DESCRIPTION | varchar(100) | NULL | A field that can hold an optional comment regarding the alignment. This can be left blank and the default value is accordingly NULL. | ||||
SEQUENCES | int(11) | + | This column states how many seqeunces (taxa) the alignemnt holds. | ||||
COLUMNS | int(11) | + | This column states how many sites (columns) the alignemnt has / states the length of the alignment. | ||||
PARSIMONY_INFORMATIVE_SITES | int(11) | + | States the number of parsimony informative sites in alignment. | ||||
SINGELTON_SITES | int(11) | + | States the number of singelton sites in alignment. | ||||
CONSTANT_SITES | int(11) | + | States the number of singelton sites in alignment. | ||||
FRAC_WILDCARDS_GAPS | decimal(5,4) | + | States the fraction of wildcards and gaps in the alignment. | ||||
DISTINCT_PATTERNS | int(11) | + | States the number of distinct patterns in alignment. | ||||
FAILED_CHI2 | int(11) | + | States the number of sequences that failed the chi2 (chi-squared) test. The test examines whether the nucleotide composition of the sequences matches the mean nucleotide frequencies across all sequences. | ||||
IDENTICAL_SEQ | int(11) | NULL | States the number of identical sequences in the alignment, should there be any. Default is NULL. | ||||
EXCLUDED_SEQ | int(11) | NULL | States the number of excluded sequences in the alignment, should there be any. Default is NULL. |
Comment: The dna_alignments table holds general information and characteristics regarding each DNA alignment in the database.
Constraints:
- FOREIGN KEY (FROM_DATABASE) REFERENCES dataorigin (DATABASE_ID)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
ALI_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | + | Name of the alignment (alignment ID). Must be unique. | |||
FROM_DATABASE | varchar(100) | + | States from which original database the alignemnt stems from (e.g. PANDIT). Serves as foreign key to connect to the dataorigin table. | ||||
DESCRIPTION | varchar(100) | NULL | A field that can hold an optional comment regarding the alignment. This can be left blank and the default value is accordingly NULL. | ||||
SEQUENCES | int(11) | + | This column states how many seqeunces (taxa) the alignemnt holds. | ||||
COLUMNS | int(11) | + | This column states how many sites (columns) the alignemnt has / states the length of the alignment. | ||||
PARSIMONY_INFORMATIVE_SITES | int(11) | + | States the number of parsimony informative sites in alignment. | ||||
SINGELTON_SITES | int(11) | + | States the number of singelton sites in alignment. | ||||
CONSTANT_SITES | int(11) | + | States the number of singelton sites in alignment. | ||||
FRAC_WILDCARDS_GAPS | decimal(5,4) | + | States the fraction of wildcards and gaps in the alignment. | ||||
DISTINCT_PATTERNS | int(11) | + | States the number of distinct patterns in alignment. | ||||
FAILED_CHI2 | int(11) | + | States the number of sequences that failed the chi2 (chi-squared) test. The test examines whether the nucleotide composition of the sequences matches the mean nucleotide frequencies across all sequences. | ||||
IDENTICAL_SEQ | int(11) | NULL | States the number of identical sequences in the alignment, should there be any. Default is NULL. | ||||
EXCLUDED_SEQ | int(11) | NULL | States the number of excluded sequences in the alignment, should there be any. Default is NULL. |
Comment: The aa_sequences table holds the sequences of each protein alignment in the EvoNAPS database as well as information regarding each sequence.
Constraints:
- UNIQUE KEY (ALI_ID,SEQ_INDEX)
- FOREIGN KEY (ALI_ID) REFERENCES aa_alignments (ALI_ID)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
SEQ_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | * | Name of the alignment (alignment ID). Serves as foreign key to connect to the aa_alignments table. | |||
SEQ_INDEX | int(11) | + | * | This column holds the unique index (integer starting with 1) for each sequence of an alignment. | |||
SEQ_NAME | varchar(250) | + | States the name of the sequence as it appears in the original alignment. | ||||
FRAC_WILDCARDS_GAPS | decimal(10,9) | NULL | States the fraction of wildcards and gaps in the sequence. | ||||
CHI2_P_VALUE | decimal(7,2) | NULL | States the p-value of the Chi-Square test for the sequence. The Chi-Square test tests whether the amino acid composition of the sequence fits the mean aa frequencies across all sequences in the alignment. | ||||
CHI2_PASSED | tinyint(1) | NULL | States whether the sequence passed (1) or failed (0) the Chi-Square test. | ||||
EXCLUDED | int(11) | NULL | States whether the sequence has been excluded from IQ-Tree calculations (without the flag *--keep-ident*). IQ-Tree excludes a sequence from its computations if there already exist at least two identical sequences in the alignment. | ||||
IDENTICAL_TO | varchar(10000) | NULL | States to which sequence(s) the sequence is identical to, if such (a) sequence(s) exist(s). | ||||
FREQ_A | decimal(10,9) | + | The frequency of the amino acid alanine (A) in the sequence. | ||||
… | … | ||||||
FREQ_V | decimal(10,9) | + | The frequency of the amino acid tyrosine (V) in the sequence. | ||||
SEQ | mediumtext | + | This text field contains the sequence (with wildcards and gaps) as it appears in the alignment. |
Comment: The dna_sequences table holds the sequences of each DNA alignment in the EvoNAPS database as well as information regarding each sequence.
Constraints:
- UNIQUE KEY (ALI_ID,SEQ_INDEX)
- FOREIGN KEY (ALI_ID) REFERENCES dna_alignments (ALI_ID)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
SEQ_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | * | Name of the alignment (alignment ID). Serves as foreign key to connect to the dna_alignments table. | |||
SEQ_INDEX | int(11) | + | * | This column holds the unique index (integer starting with 1) for each sequence of an alignment. | |||
SEQ_NAME | varchar(250) | + | States the name of the sequence as it appears in the original alignment. | ||||
FRAC_WILDCARDS_GAPS | decimal(10,9) | NULL | States the fraction of wildcards and gaps in the sequence. | ||||
CHI2_P_VALUE | decimal(7,2) | NULL | States the p-value of the Chi-Square test for the sequence. The Chi-Square test tests whether the nucleotide composition of the sequence fits the mean dna frequencies across all sequences in the alignment. | ||||
CHI2_PASSED | tinyint(1) | NULL | States whether the sequence passed (1) or failed (0) the Chi-Square test. | ||||
EXCLUDED | int(11) | NULL | States whether the sequence has been excluded from IQ-Tree calculations (without the flag *--keep-ident*). IQ-Tree excludes a sequence from its computations if there already exist at least two identical sequences in the alignment. | ||||
IDENTICAL_TO | varchar(10000) | NULL | States to which sequence(s) the sequence is identical to, if such (a) sequence(s) exist(s). | ||||
FREQ_A | decimal(10,9) | + | The frequency of the base adenine (A) in the sequence. | ||||
FREQ_C | decimal(10,9) | + | The frequency of the base cytosine (C) in the sequence. | ||||
FREQ_G | decimal(10,9) | + | The frequency of the base guanine (G) in the sequence. | ||||
FREQ_T | decimal(10,9) | + | The frequency of the base thymine (T) in the sequence. | ||||
SEQ | mediumtext | + | This text field contains the sequence (with wildcards and gaps) as it appears in the alignment. |
Comment: The aa_modelparameters table holds the results of model selection. The performance of each evaluated model (LogL, AIC, BIC,...) is clearly documented as well as the parameters of the model (state frequencies, rates, shape parameter alpha,...).
Constraints:
- KEY (BASE_MODEL)
- UNIQUE KEY (ALI_ID,TIME_STAMP,MODEL)
- FOREIGN KEY (ALI_ID) REFERENCES aa_alignments (ALI_ID)
- FOREIGN KEY (BASE_MODEL) REFERENCES aa_models (MODEL_NAME)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
MODELTEST_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | * | Name of the alignment (alignment ID). | |||
IQTREE_VERSION | varchar(100) | + | |||||
RANDOM_SEED | int(11) | + | The random number seed used by IQ-Tree. | ||||
TIME_STAMP | datetime | + | * | The timestamp as it appears in the .iqtree output file. The timestamp enables mapping of the tested model to one IQ-Tree run. | |||
MODEL_TYPE | varchar(100) | + | The type of model testing or the type of models that were tested in the IQ-Tree run. Will mostly be MF (the models included in the default ModelFinder algorithm). | ||||
KEEP_IDENT | tinyint(1) | NULL | Boolean stating whether the --keep-ident flag has been enabled (1) or disabled (0) in the IQ-Tree run. | ||||
MODEL | varchar(100) | + | * | Name of the tested model | |||
BASE_MODEL | varchar(100) | + | Name of the substitution rate matrix used in the model. | ||||
MODEL_RATE_HETEROGENEITY | varchar(100) | NULL | Name of the model of rate heterogeneity (should one have been employed). | ||||
NUM_RATE_CAT | int(11) | NULL | Number of rate categories assumed by the model. | ||||
LOGL | decimal(21,9) | + | Logarithmic likelihood | ||||
AIC | decimal(21,9) | + | |||||
WEIGHTED_AIC | float | + | |||||
CONFIDENCE_AIC | tinyint(1) | + | Boolean stating whether the weighted AIC is above 0.05 (1) or under (0). | ||||
AICC | decimal(21,9) | + | |||||
WEIGHTED_AICC | float | + | |||||
CONFIDENCE_AICC | tinyint(1) | + | Boolean stating whether the weighted AICC is above 0.5 (1) or under (0). | ||||
BIC | decimal(21,9) | + | |||||
WEIGHTED_BIC | float | + | |||||
CONFIDENCE_BIC | tinyint(1) | + | Boolean stating whether the weighted BIC is above 0.05 (1) or under (0). | ||||
CAIC | decimal(21,9) | + | |||||
WEIGHTED_CAIC | float | + | |||||
CONFIDENCE_CAIC | tinyint(1) | + | Boolean stating whether the weighted CAIC is above 0.05 (1) or under (0). | ||||
ABIC | decimal(21,9) | + | |||||
WEIGHTED_ABIC | float | + | |||||
CONFIDENCE_ABIC | tinyint(1) | + | Boolean stating whether the weighted ABIC is above 0.05 (1) or under (0). | ||||
NUM_FREE_PARAMETERS | int(11) | + | Number of free parameters (=NUM_MODEL_PARAMETERS+NUM_BRANCHES). | ||||
NUM_MODEL_PARAMETERS | int(11) | + | Number of free parameters of the model of sequence evolution | ||||
NUM_BRANCHES | int(11) | + | Number of branches in the phylogenetic tree. In a fully resolved tree: 2n-3 (with n taxa). | ||||
TREE_LENGTH | decimal(15,9) | + | Length of the tree (might differ for the different models as the branch lengths are being re-estimated during model evaluation). | ||||
PROP_INVAR | decimal(10,9) | NULL | Proportion of invariable sites in case the +I model of rate heterogeneity was employed. Else, NULL. | ||||
ALPHA | decimal(15,9) | NULL | Shape parameter alpha should an Gamma +G4 model have been employed. Else, NULL. | ||||
STAT_FREQ_TYPE | varchar(100) | + | This field states whether the state frequencies of the stationary distribution assumed in the model are empirical (counted frequencies from the alignment) or if they are predefined by the model (model). | ||||
STAT_FREQ_A | decimal(10,9) | + | The stationary frequency of the amino acid alanine (A) assumed by the model. | ||||
… | … | ||||||
STAT_FREQ_V | decimal(10,9) | + | The stationary frequency of the amino acid tyrosine (V) assumed by the model. | ||||
PROP_CAT_1 | decimal(10,9) | NULL | The proportion of the first rate category (should the model assume different rates across sites). | ||||
REL_RATE_CAT_1 | decimal(15,9) | NULL | The rate of the first rate category (should the model assume different rates across sites). | ||||
… | … | ||||||
PROP_CAT_10 | decimal(10,9) | NULL | The proportion of the tenth rate category (should there exist one). | ||||
REL_RATE_CAT_10 | decimal(15,9) | NULL | The rate of the tenth rate category (should there exist one). |
Comment: The dna_modelparameters table holds the results of model selection. The performance of each evaluated model (LogL, AIC, BIC,...) is clearly documented as well as the parameters of the model (state frequencies, rates, shape parameter alpha,...).
Constraints:
- KEY (BASE_MODEL)
- UNIQUE KEY (ALI_ID,TIME_STAMP,MODEL)
- FOREIGN KEY (ALI_ID) REFERENCES dna_alignments (ALI_ID)
- FOREIGN KEY (BASE_MODEL) REFERENCES dna_models (MODEL_NAME)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
MODELTEST_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | * | Name of the alignment (alignment ID). | |||
IQTREE_VERSION | varchar(100) | + | |||||
RANDOM_SEED | int(11) | + | The random number seed used by IQ-Tree. | ||||
TIME_STAMP | datetime | + | * | The timestamp as it appears in the .iqtree output file. The timestamp enables mapping of the tested model to one IQ-Tree run. | |||
MODEL_TYPE | varchar(100) | + | The type of model testing or the type of models that were tested in the IQ-Tree run. Will mostly be MF (the models included in the default ModelFinder algorithm). | ||||
KEEP_IDENT | tinyint(1) | NULL | Boolean stating whether the --keep-ident flag has been enabled (1) or disabled (0) in the IQ-Tree run. | ||||
MODEL | varchar(100) | + | * | Name of the tested model | |||
BASE_MODEL | varchar(100) | + | Name of the substitution rate matrix used in the model. | ||||
MODEL_RATE_HETEROGENEITY | varchar(100) | NULL | Name of the model of rate heterogeneity (should one have been employed). | ||||
NUM_RATE_CAT | int(11) | NULL | Number of rate categories assumed by the model. | ||||
LOGL | decimal(21,9) | + | Logarithmic likelihood | ||||
AIC | decimal(21,9) | + | |||||
WEIGHTED_AIC | float | + | |||||
CONFIDENCE_AIC | tinyint(1) | + | Boolean stating whether the weighted AIC is above 0.05 (1) or under (0). | ||||
AICC | decimal(21,9) | + | |||||
WEIGHTED_AICC | float | + | |||||
CONFIDENCE_AICC | tinyint(1) | + | Boolean stating whether the weighted AICC is above 0.5 (1) or under (0). | ||||
BIC | decimal(21,9) | + | |||||
WEIGHTED_BIC | float | + | |||||
CONFIDENCE_BIC | tinyint(1) | + | Boolean stating whether the weighted BIC is above 0.05 (1) or under (0). | ||||
CAIC | decimal(21,9) | + | |||||
WEIGHTED_CAIC | float | + | |||||
CONFIDENCE_CAIC | tinyint(1) | + | Boolean stating whether the weighted CAIC is above 0.05 (1) or under (0). | ||||
ABIC | decimal(21,9) | + | |||||
WEIGHTED_ABIC | float | + | |||||
CONFIDENCE_ABIC | tinyint(1) | + | Boolean stating whether the weighted ABIC is above 0.05 (1) or under (0). | ||||
NUM_FREE_PARAMETERS | int(11) | + | Number of free parameters (=NUM_MODEL_PARAMETERS+NUM_BRANCHES). | ||||
NUM_MODEL_PARAMETERS | int(11) | + | Number of free parameters of the model of sequence evolution | ||||
NUM_BRANCHES | int(11) | + | Number of branches in the phylogenetic tree. In a fully resolved tree: 2n-3 (with n taxa). | ||||
TREE_LENGTH | decimal(15,9) | + | Length of the tree (might differ for the different models as the branch lengths are being re-estimated during model evaluation). | ||||
PROP_INVAR | decimal(10,9) | NULL | Proportion of invariable sites in case the +I model of rate heterogeneity was employed. Else, NULL. | ||||
ALPHA | decimal(15,9) | NULL | Shape parameter alpha should an Gamma +G4 model have been employed. Else, NULL. | ||||
STAT_FREQ_TYPE | varchar(100) | + | This field states whether the state frequencies of the stationary distribution assumed in the model are empirical (counted frequencies from the alignment) or if they are predefined by the model (model). | ||||
STAT_FREQ_A | decimal(10,9) | + | The stationary frequency of the base adenine (A) assumed by the model. | ||||
STAT_FREQ_C | decimal(10,9) | + | The stationary frequency of the base guanine (G) assumed by the model. | ||||
STAT_FREQ_G | decimal(10,9) | + | The stationary frequency of the base cytosine (C) assumed by the model. | ||||
STAT_FREQ_T | decimal(10,9) | + | The stationary frequency of the base thymine (T) assumed by the model. | ||||
RATE_AC | decimal(15,9) | + | Assumed relative substitution rate from A to C. | ||||
RATE_CA | decimal(15,9) | + | Assumed relative substitution rate from C to A. | ||||
… | … | ||||||
RATE_GT | decimal(15,9) | + | Assumed relative substitution rate from G to T. | ||||
RATE_TG | decimal(15,9) | + | Assumed relative substitution rate from T to G. | ||||
PROP_CAT_1 | decimal(10,9) | NULL | The proportion of the first rate category (should the model assume different rates across sites). | ||||
REL_RATE_CAT_1 | decimal(15,9) | NULL | The rate of the first rate category (should the model assume different rates across sites). | ||||
… | … | ||||||
PROP_CAT_10 | decimal(10,9) | NULL | The proportion of the tenth rate category (should there exist one). | ||||
REL_RATE_CAT_10 | decimal(15,9) | NULL | The rate of the tenth rate category (should there exist one). |
Comment: The aa_trees table contains a set of phylogenetic trees as well as the parameters of the assumed model of sequence evolution. The trees are either a fast-ML tree used in the model evaluation or a maximum likelihood (ML) tree inferred using the best-fit model.
Constraints:
- KEY (BASE_MODEL)
- UNIQUE KEY (ALI_ID,TIME_STAMP,TREE_TYPE)
- FOREIGN KEY (ALI_ID) REFERENCES aa_alignments (ALI_ID)
- FOREIGN KEY (BASE_MODEL) REFERENCES aa_models (MODEL_NAME)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
TREE_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | * | Name of the alignment (alignment ID). | |||
IQTREE_VERSION | varchar(100) | + | |||||
RANDOM_SEED | int(11) | + | The random number seed used by IQ-Tree. | ||||
TIME_STAMP | datetime | + | * | The timestamp as it appears in the .iqtree output file. The timestamp enables mapping of the tested model to one IQ-Tree run. | |||
MODEL_TYPE | varchar(100) | + | The type of model testing or the type of models that were tested in the IQ-Tree run. Will mostly be MF (the models included in the default ModelFinder algorithm). | ||||
TREE_TYPE | varchar(100) | + | * | States the type of the tree. It is either initial (fast ML tree used for model evaluation) or ML (maximum likelihood tree). | |||
CHOICE_CRITERIUM | varchar(100) | NULL | States the choice criterium used to select the model for the ML tree search. In case of an initial tree, this field is left empty (NULL). | ||||
KEEP_IDENT | tinyint(1) | NULL | Boolean stating whether the --keep-ident flag has been enabled (1) or disabled (0) in the IQ-Tree run. | ||||
MODEL | varchar(100) | + | Name of the tested model | ||||
BASE_MODEL | varchar(100) | + | Name of the substitution rate matrix used in the model. | ||||
MODEL_RATE_HETEROGENEITY | varchar(100) | NULL | Name of the model of rate heterogeneity (should one have been employed). | ||||
NUM_RATE_CAT | int(11) | NULL | Number of rate categories assumed by the model. | ||||
LOGL | decimal(21,9) | + | Logarithmic likelihood | ||||
UNCONSTRAINED_LOGL | decimal(21,9) | Unconstrained logarithmic likelihood | |||||
AIC | decimal(21,9) | + | |||||
AICC | decimal(21,9) | + | |||||
BIC | decimal(21,9) | + | |||||
CAIC | decimal(21,9) | NULL | |||||
ABIC | decimal(21,9) | NULL | |||||
NUM_FREE_PARAMETERS | int(11) | + | Number of free parameters (=NUM_MODEL_PARAMETERS+NUM_BRANCHES). | ||||
NUM_MODEL_PARAMETERS | int(11) | + | Number of free parameters of the model of sequence evolution | ||||
NUM_BRANCHES | int(11) | + | Number of branches in the phylogenetic tree. In a fully resolved tree: 2n-3 (with n taxa). | ||||
PROP_INVAR | decimal(10,9) | NULL | Proportion of invariable sites in case the +I model of rate heterogeneity was employed. Else, NULL. | ||||
ALPHA | decimal(15,9) | NULL | Shape parameter alpha should an Gamma +G4 model have been employed. Else, NULL. | ||||
STAT_FREQ_TYPE | varchar(100) | + | This field states whether the state frequencies of the stationary distribution assumed in the model are empirical (counted frequencies from the alignment) or if they are predefined by the model (model). | ||||
STAT_FREQ_A | decimal(10,9) | + | The frequency of the amino acid alanine (A) assumed by the model. | ||||
… | … | ||||||
STAT_FREQ_V | decimal(10,9) | + | The frequency of the amino acid tyrosine (V) assumed by the model. | ||||
PROP_CAT_1 | decimal(10,9) | NULL | The proportion of the first rate category (should the model assume different rates across sites). | ||||
REL_RATE_CAT_1 | decimal(15,9) | NULL | The rate of the first rate category (should the model assume different rates across sites). | ||||
… | … | ||||||
PROP_CAT_10 | decimal(10,9) | NULL | The proportion of the tenth rate category (should there exist one). | ||||
REL_RATE_CAT_10 | decimal(15,9) | NULL | The rate of the tenth rate category (should there exist one). | ||||
TREE_LENGTH | decimal(15,9) | + | Total length of the tree (sum of all branch lengths). | ||||
SUM_IBL | decimal(15,9) | + | Sum of internal branch lengths | ||||
TREE_DIAMETER | decimal(15,9) | + | The tree diameter states the furthest distance (sum of BLs) between two taxa in the tree. | ||||
DIST_MIN | decimal(15,9) | NULL | Minimal distance between two sequences in the alignment caculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
DIST_MAX | decimal(15,9) | NULL | Maximum distance between two sequences in the alignment calculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
DIST_MEAN | decimal(15,9) | NULL | Mean distance between two sequences in the alignment calculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
DIST_MEDIAN | decimal(15,9) | NULL | Median distance between two sequences in the alignment calculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
DIST_VAR | decimal(15,9) | NULL | Variation in distances between any two sequences in the alignment calculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
BL_MIN | decimal(15,9) | NULL | Shortest branch in the tree | ||||
BL_MAX | decimal(15,9) | NULL | Longest branch in the tree. | ||||
BL_MEAN | decimal(15,9) | NULL | Mean branch length in the tree. | ||||
BL_MEDIAN | decimal(15,9) | NULL | Median branch length in the tree. | ||||
BL_VAR | decimal(15,9) | NULL | Variation in branch lengths in the tree. | ||||
IBL_MIN | decimal(15,9) | NULL | Shortest internal branch in the tree | ||||
IBL_MAX | decimal(15,9) | NULL | Longest internal branch in the tree. | ||||
IBL_MEAN | decimal(15,9) | NULL | Mean internal branch length in the tree. | ||||
IBL_MEDIAN | decimal(15,9) | NULL | Median internal branch length in the tree. | ||||
IBL_VAR | decimal(15,9) | NULL | Variation in internal branch lengths in the tree. | ||||
EBL_MIN | decimal(15,9) | NULL | Shortest external branch in the tree | ||||
EBL_MAX | decimal(15,9) | NULL | Longest external branch in the tree. | ||||
EBL_MEAN | decimal(15,9) | NULL | Mean external branch length in the tree. | ||||
EBL_MEDIAN | decimal(15,9) | NULL | Median external branch length in the tree. | ||||
EBL_VAR | decimal(15,9) | NULL | Variation in external branch lengths in the tree. | ||||
POT_LBA_7 | int(11) | NULL | States if there exists a potential long branch attraction (LBA) problem in the tree. Assuming that the long branches need to be at least 7 times larger than the short and internal branch. | ||||
POT_LBA_8 | int(11) | NULL | States if there exists a potential long branch attraction (LBA) problem in the tree. Assuming that the long branches need to be at least 8 times larger than the short and internal branch. | ||||
POT_LBA_9 | int(11) | NULL | States if there exists a potential long branch attraction (LBA) problem in the tree. Assuming that the long branches need to be at least 9 times larger than the short and internal branch. | ||||
POT_LBA_10 | int(11) | NULL | States if there exists a potential long branch attraction (LBA) problem in the tree. Assuming that the long branches need to be at least 10 times larger than the short and internal branch. | ||||
NEWICK_STRING | mediumtext | + | This field contains the Newick string of the phylogenetic tree. |
Comment: The dna_trees table contains a set of phylogenetic trees as well as the parameters of the assumed model of sequence evolution. The trees are either a fast-ML tree used in the model evaluation or a maximum likelihood (ML) tree inferred using the best-fit model.
Constraints:
- KEY (BASE_MODEL)
- UNIQUE KEY (ALI_ID,TIME_STAMP,TREE_TYPE)
- FOREIGN KEY (ALI_ID) REFERENCES dna_alignments (ALI_ID)
- FOREIGN KEY (BASE_MODEL) REFERENCES dna_models (MODEL_NAME)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
TREE_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | * | Name of the alignment (alignment ID). | |||
IQTREE_VERSION | varchar(100) | + | |||||
RANDOM_SEED | int(11) | + | The random number seed used by IQ-Tree. | ||||
TIME_STAMP | datetime | + | * | The timestamp as it appears in the .iqtree output file. The timestamp enables mapping of the tested model to one IQ-Tree run. | |||
MODEL_TYPE | varchar(100) | + | The type of model testing or the type of models that were tested in the IQ-Tree run. Will mostly be MF (the models included in the default ModelFinder algorithm). | ||||
TREE_TYPE | varchar(100) | + | * | States the type of the tree. It is either initial (fast ML tree used for model evaluation) or ML (maximum likelihood tree). | |||
CHOICE_CRITERIUM | varchar(100) | NULL | States the choice criterium used to select the model for the ML tree search. In case of an initial tree, this field is left empty (NULL). | ||||
KEEP_IDENT | tinyint(1) | NULL | Boolean stating whether the --keep-ident flag has been enabled (1) or disabled (0) in the IQ-Tree run. | ||||
MODEL | varchar(100) | + | Name of the tested model | ||||
BASE_MODEL | varchar(100) | + | Name of the substitution rate matrix used in the model. | ||||
MODEL_RATE_HETEROGENEITY | varchar(100) | NULL | Name of the model of rate heterogeneity (should one have been employed). | ||||
NUM_RATE_CAT | int(11) | NULL | Number of rate categories assumed by the model. | ||||
LOGL | decimal(21,9) | + | Logarithmic likelihood | ||||
UNCONSTRAINED_LOGL | decimal(21,9) | Unconstrained logarithmic likelihood | |||||
AIC | decimal(21,9) | + | |||||
AICC | decimal(21,9) | + | |||||
BIC | decimal(21,9) | + | |||||
CAIC | decimal(21,9) | NULL | |||||
ABIC | decimal(21,9) | NULL | |||||
NUM_FREE_PARAMETERS | int(11) | + | Number of free parameters (=NUM_MODEL_PARAMETERS+NUM_BRANCHES). | ||||
NUM_MODEL_PARAMETERS | int(11) | + | Number of free parameters of the model of sequence evolution | ||||
NUM_BRANCHES | int(11) | + | Number of branches in the phylogenetic tree. In a fully resolved tree: 2n-3 (with n taxa). | ||||
PROP_INVAR | decimal(10,9) | NULL | Proportion of invariable sites in case the +I model of rate heterogeneity was employed. Else, NULL. | ||||
ALPHA | decimal(15,9) | NULL | Shape parameter alpha should an Gamma +G4 model have been employed. Else, NULL. | ||||
STAT_FREQ_TYPE | varchar(100) | + | This field states whether the state frequencies of the stationary distribution assumed in the model are empirical (counted frequencies from the alignment) or if they are predefined by the model (model). | ||||
STAT_FREQ_A | decimal(10,9) | + | The frequency of the base adenine (A) assumed by the model. | ||||
STAT_FREQ_C | decimal(10,9) | + | The frequency of the base guanine (G) assumed by the model. | ||||
STAT_FREQ_G | decimal(10,9) | + | The frequency of the base cytosine (C) assumed by the model. | ||||
STAT_FREQ_T | decimal(10,9) | + | The frequency of the base thymine (T) assumed by the model. | ||||
RATE_AC | decimal(15,9) | + | Assumed relative substitution rate from A to C. | ||||
RATE_CA | decimal(15,9) | + | Assumed relative substitution rate from C to A. | ||||
… | … | ||||||
RATE_GT | decimal(15,9) | + | Assumed relative substitution rate from G to T. | ||||
RATE_TG | decimal(15,9) | + | Assumed relative substitution rate from T to G. | ||||
PROP_CAT_1 | decimal(10,9) | NULL | The proportion of the first rate category (should the model assume different rates across sites). | ||||
REL_RATE_CAT_1 | decimal(15,9) | NULL | The rate of the first rate category (should the model assume different rates across sites). | ||||
… | … | ||||||
PROP_CAT_10 | decimal(10,9) | NULL | The proportion of the tenth rate category (should there exist one). | ||||
REL_RATE_CAT_10 | decimal(15,9) | NULL | The rate of the tenth rate category (should there exist one). | ||||
TREE_LENGTH | decimal(15,9) | + | Total length of the tree (sum of all branch lengths). | ||||
SUM_IBL | decimal(15,9) | + | Sum of internal branch lengths | ||||
TREE_DIAMETER | decimal(15,9) | + | The tree diameter states the furthest distance (sum of BLs) between two taxa in the tree. | ||||
DIST_MIN | decimal(15,9) | NULL | Minimal distance between two sequences in the alignment caculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
DIST_MAX | decimal(15,9) | NULL | Maximum distance between two sequences in the alignment calculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
DIST_MEAN | decimal(15,9) | NULL | Mean distance between two sequences in the alignment calculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
DIST_MEDIAN | decimal(15,9) | NULL | Meadian distance between two sequences in the alignment calculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
DIST_VAR | decimal(15,9) | NULL | Variation in distances between any two sequences in the alignment calculated using the best-fit model. In case of initial tree, this field will be NULL. | ||||
BL_MIN | decimal(15,9) | NULL | Shortest branch in the tree | ||||
BL_MAX | decimal(15,9) | NULL | Longest branch in the tree. | ||||
BL_MEAN | decimal(15,9) | NULL | Mean branch length in the tree. | ||||
BL_MEDIAN | decimal(15,9) | NULL | Median branch length in the tree. | ||||
BL_VAR | decimal(15,9) | NULL | Variation in branch lengths in the tree. | ||||
IBL_MIN | decimal(15,9) | NULL | Shortest internal branch in the tree | ||||
IBL_MAX | decimal(15,9) | NULL | Longest internal branch in the tree. | ||||
IBL_MEAN | decimal(15,9) | NULL | Mean internal branch length in the tree. | ||||
IBL_MEDIAN | decimal(15,9) | NULL | Median internal branch length in the tree. | ||||
IBL_VAR | decimal(15,9) | NULL | Variation in internal branch lengths in the tree. | ||||
EBL_MIN | decimal(15,9) | NULL | Shortest external branch in the tree | ||||
EBL_MAX | decimal(15,9) | NULL | Longest external branch in the tree. | ||||
EBL_MEAN | decimal(15,9) | NULL | Mean external branch length in the tree. | ||||
EBL_MEDIAN | decimal(15,9) | NULL | Median external branch length in the tree. | ||||
EBL_VAR | decimal(15,9) | NULL | Variation in external branch lengths in the tree. | ||||
POT_LBA_7 | int(11) | NULL | States if there exists a potential long branch attraction (LBA) problem in the tree. Assuming that the long branches need to be at least 7 times larger than the short and internal branch. | ||||
POT_LBA_8 | int(11) | NULL | States if there exists a potential long branch attraction (LBA) problem in the tree. Assuming that the long branches need to be at least 8 times larger than the short and internal branch. | ||||
POT_LBA_9 | int(11) | NULL | States if there exists a potential long branch attraction (LBA) problem in the tree. Assuming that the long branches need to be at least 9 times larger than the short and internal branch. | ||||
POT_LBA_10 | int(11) | NULL | States if there exists a potential long branch attraction (LBA) problem in the tree. Assuming that the long branches need to be at least 10 times larger than the short and internal branch. | ||||
NEWICK_STRING | mediumtext | + | This field contains the Newick string of the phylogenetic tree. |
Comment: The aa_branches table contains information regarding the branches of the phylogenetic trees stored in the aa_trees table. Each line contains information regarding one branch such as the branch type, the branch length, the splitsize, etc.
Constraints:
- UNIQUE KEY (ALI_ID,BRANCH_INDEX,TIME_STAMP,TREE_TYPE)
- FOREIGN KEY (ALI_ID, TIME_STAMP, TREE_TYPE) REFERENCES aa_trees (ALI_ID, TIME_STAMP, TREE_TYPE)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
BRANCH_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | * | Name of the alignment (alignment ID). | |||
TIME_STAMP | datetime | + | * | The timestamp as it appears in the .iqtree output file. The timestamp, paired with the alignment ID and tree type, enables the mapping of each branch to a phylogenetic tree in the aa_trees table. | |||
TREE_TYPE | varchar(100) | + | * | The type of tree: initial or ML. The tree type, paired with the alignment ID and time stamp, enables the mapping of each branch to a phylogenetic tree in the aa_trees table. | |||
BRANCH_INDEX | int(11) | + | Index of the branch. Should the branch be external, then the index connected to a taxon coincides with the SEQ_INDEX of the corresponding sequence in the aa_sequences table with the same ALI_ID. | ||||
BRANCH_TYPE | varchar(30) | + | States the type of branch, either internal or external | ||||
BL | decimal(15,9) | + | Branch length. | ||||
SPLIT_SIZE | int(11) | + | States the split size (number of taxa in the smaller subtree). For external branches, the splitsize is always 1. | ||||
MIN_PATH_1 | decimal(15,9) | NULL | Shortest path length to the leaves in the smaller subtree. | ||||
MAX_PATH_1 | decimal(15,9) | NULL | Longest path length to the leaves in the smaller subtree. | ||||
MEAN_PATH_1 | decimal(15,9) | NULL | Mean path length to the leaves in the smaller subtree. | ||||
MEDIAN_PATH_1 | decimal(15,9) | NULL | Median path length to the leaves in the smaller subtree. | ||||
MIN_PATH_2 | decimal(15,9) | NULL | Shortest path length to the leaves in the larger subtree. | ||||
MAX_PATH_2 | decimal(15,9) | NULL | Longest path length to the leaves in the larger subtree. | ||||
MEAN_PATH_2 | decimal(15,9) | NULL | Mean path length to the leaves in the larger subtree. | ||||
MEDIAN_PATH_2 | decimal(15,9) | NULL | Median path length to the leaves in the larger subtree. |
Comment: The dna_branches table contains information regarding the branches of the phylogenetic trees stored in the dna_trees table. Each line contains information regarding one branch such as the branch type, the branch length, the splitsize, etc.
Constraints:
- UNIQUE KEY (ALI_ID,BRANCH_INDEX,TIME_STAMP,TREE_TYPE)
- FOREIGN KEY (ALI_ID, TIME_STAMP, TREE_TYPE) REFERENCES dna_trees (ALI_ID, TIME_STAMP, TREE_TYPE)
Column Name | Datatype | PK | NN | UQ | AI | Default | Comment |
---|---|---|---|---|---|---|---|
BRANCH_KEY | int(11) | + | + | + | Autoincremented primary key. | ||
ALI_ID | varchar(250) | + | * | Name of the alignment (alignment ID). | |||
TIME_STAMP | datetime | + | * | The timestamp as it appears in the .iqtree output file. The timestamp, paired with the alignment ID and tree type, enables the mapping of each branch to a phylogenetic tree in the dna_trees table. | |||
TREE_TYPE | varchar(100) | + | * | The type of tree: initial or ML. The tree type, paired with the alignment ID and time stamp, enables the mapping of each branch to a phylogenetic tree in the dna_trees table. | |||
BRANCH_INDEX | int(11) | + | Index of the branch. Should the branch be external, then the index connected to a taxon coincides with the SEQ_INDEX of the corresponding sequence in the aa_sequences table with the same ALI_ID. | ||||
BRANCH_TYPE | varchar(30) | + | States the type of branch, either internal or external | ||||
BL | decimal(15,9) | + | Branch length. | ||||
SPLIT_SIZE | int(11) | + | States the split size (number of taxa in the smaller subtree). For external branches, the splitsize is always 1. | ||||
MIN_PATH_1 | decimal(15,9) | NULL | Shortest path length to the leaves in the smaller subtree. | ||||
MAX_PATH_1 | decimal(15,9) | NULL | Longest path length to the leaves in the smaller subtree. | ||||
MEAN_PATH_1 | decimal(15,9) | NULL | Mean path length to the leaves in the smaller subtree. | ||||
MEDIAN_PATH_1 | decimal(15,9) | NULL | Median path length to the leaves in the smaller subtree. | ||||
MIN_PATH_2 | decimal(15,9) | NULL | Shortest path length to the leaves in the larger subtree. | ||||
MAX_PATH_2 | decimal(15,9) | NULL | Longest path length to the leaves in the larger subtree. | ||||
MEAN_PATH_2 | decimal(15,9) | NULL | Mean path length to the leaves in the larger subtree. | ||||
MEDIAN_PATH_2 | decimal(15,9) | NULL | Median path length to the leaves in the larger subtree. |