-
Notifications
You must be signed in to change notification settings - Fork 1
Output
a) Simple dataset from Scoary 1
How Scoary was run
# download dataset
wget --recursive -nd --no-parent -A csv,tsv https://scoary.bioinformatics.unibe.ch/datasets/scoary-1-tetracycline/
# run Scoary2
scoary \
--genes Gene_presence_absence.csv \
--gene-info gene-info.tsv \
--traits Tetracycline_resistance.csv \
--outdir out \
--n-permut 10000
# the argument gene-info is optional
See output here.
b) Large metabolomics dataset (OrthoFinder)
How Scoary was run
# download dataset
wget --recursive -nd --no-parent -A tsv https://scoary.bioinformatics.unibe.ch/datasets/44-propioni/
# run Scoary2
scoary \
--genes N0.tsv \
--gene-data-type 'gene-list:\t' \
--gene-info N0_best_names.tsv \
--traits traits_44_noraw.tsv \
--trait-data-type 'gaussian:kmeans:\t' \
--trait-info traits_info_44_noraw.tsv \
--isolate-info isolate_info_44.tsv \
--multiple_testing bonferroni:0.1 \
--n-permut 1000 \
--n-cpus 8 \
--random-state 42 \
--outdir out
# the following are optional: gene-info, trait-info, isolate-info
See output of a metabolomics dataset here.
We recommend using limit_traits
and a low n-permut
to determine the optimal Scoary parameters before crunching the full dataset. If the dataset has a particularly strong population structure, also use worst_cutoff
to remove traits that merely correlate with the phylogeny. (See manual.)
Table that contains one row per trait analyzed, summarizing the result. Rows:
-
Trait
: name of the trait -
best_fisher_p
: uncorrected p-value of Fisher's test for the "best" gene -
best_fisher_q
: multiple testing corrected p-value of Fisher's test for the "best" gene -
best_empirical_p
: p-value of the post-hoc permutation test for the "best" gene -
best_fq*ep
: product offisher_q
andempirical_p
for the "best" gene -
...
: potential additional metadata columns fromtrait-info.tsv
The "best" gene is defined as the gene with the lowest best_fq*ep
score.
See also: Understanding the p-values
This SVG image file is made interactive in output.html. It contains:
- Left: Dendrogram of traits.
- Middle: negative logarithms of
best_fisher_q
,best_empirical_p
andbest_fq*ep
calculated by Scoary2. - Right: names of the traits.
Makes overview_plot.svg
interactive and links traits to traits.html
See section How to use the app
This folder contains a subfolder for each trait. These subfolders contain the following files:
-
results.tsv
: The content of this file is similar to the main output of original Scoary. Rows:-
Gene
: Name of the gene -
Name
: Description of the gene fromgene_info.tsv
(optional) -
g+t+
: Number of isolates that have the gene (g+
) and have the trait (t+
) -
g+t-
,g-t+
,g-t-
: Seeg+t+
. These four numbers constitute the input for Fisher's test. -
sensitivity
: The sensitivity if using the presence of this gene as a diagnostic test to determine trait-positivity -
specificity
: The specificity if using the non-presence of this gene as a diagnostic test to determine trait-negativity -
odds_ratio
: Odds ration (quantifies the strength of the association) -
fisher_p
,fisher_q
: corrected and uncorrected p-value of Fisher's test -
empirical_p
: p-value of the post-hoc permutation test for the "best" gene -
fq*ep
: product offisher_q
andempirical_p
-
contrasting
: The maximum number of pairs that contrast in both gene and trait characters that can be drawn on the phylogenetic tree without intersecting lines -
supporting
,opposing
: The maximum numbercontrasting
pairs that support/oppose the hypothesis -
best
: p-value of picking nsupporting
pairs out of ncontrasting
pairs -
worst
: p-value of picking nopposing
pairs out of ncontrasting
pairs
-
-
coverage_matrix.tsv
: Table that indicates which isolates have which genes -
meta.json
: Metadata about how the trait was binarized -
values.tsv
: Table that indicates the original continuous value of each isolate and how it was classified (optional)
Visualizes the data in a trait's folder and makes it interactive.
Shows a phylogenetic tree of the isolates, the tables results.tsv
and coverage_matrix.tsv
,
a pie chart that shows how the orthogene and the trait intersect in the dataset and
a histogram of the continuous values, colored by whether each isolates has the orthogene and the trait.
See section How to use the app
Binary trait matrix. Rows: isolates; columns: traits
Metadata about each isolate (optional)
Contains configuration, HTML and CSS for overview.html
and trait.html
By modifying link-config, the behaviour of trait.html
can be changed.
See section How to use the app
Contains log files