-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nextclade
workflow [#2]
#10
Conversation
23d5d30
to
7fddc9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't completely finished reviewing, but wanted to flag the reference tree only included 122 genomes when I ran this workflow locally. Not sure if that's intentional, but that seems low to me.
It's intentional — the strains in the |
7fddc9e
to
46acdd7
Compare
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of non-blocking comments. Seems to be working well with ~80% clade assignment in the test-output!
|
||
rule filter: | ||
message: """ | ||
Filtering to defined set of strains with known genotypes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
threading conversation
Haven't completely finished reviewing, but wanted to flag the reference tree only included 122 genomes when I ran this workflow locally. Not sure if that's intentional, but that seems low to me.
It's intentional — the strains in the
include
file are extracted from the 2 papers that defined the genotypes that are being assigned. The point of this dataset is to bootstrap a genotype-defining tree. The NCBI dataset (which is ~2000 sequences) doesn't have any systematic genotype annotation (at least, that I've been able to find), so these 122 are the only sequences with definitive genotypes.
Ah gotcha, that's good to know. Looking at the reference tree docs:
The tree should be sufficiently large and diverse to meet clade assignment expectations of a particular use-case, study or experiment.
It's not clear to me if 122 sequences is enough for a reference tree in Nextclade, but maybe we start here and handpick other sequences to include later 🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me if 122 sequences is enough for a reference tree in Nextclade, but maybe we start here and handpick other sequences to include later 🤷♀️
nod I know I don't know enough to have an opinion here. I opened a PR to add this to nextclade_data
, so perhaps this conversation will happen in that context.
Addressed all your other comments; will plan on merging this early next week sans other feedback.
nextclade/defaults/colors.tsv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to flag that the group usually has an aversion to neon colors 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a set of "acceptable" colors? When I was using the default ones, because of how they were assigned, it was super hard to distinguish between (e.g.) West Africa I and West Africa II, because they were both shades of dark blue.
I picked these semi-randomly by just shuffling around hex values, but I like that they're all clearly distinguishable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a set of "acceptable" colors?
I usually just pick colors from Auspice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.markdownlintrc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH, markdownlintrc is new to me. Is this going be used in pre-commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't hooked it up there yet, but it could be!
0431a05
to
d57d611
Compare
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
d57d611
to
165eac6
Compare
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
Description of proposed changes
This adds a workflow to build a Nextclade tree based on a set of genotype references extracted from two papers.
Related issue(s)
#2
Checklist