Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nextclade workflow [#2] #10

Merged
merged 1 commit into from
Aug 5, 2024
Merged

Add nextclade workflow [#2] #10

merged 1 commit into from
Aug 5, 2024

Conversation

genehack
Copy link
Contributor

@genehack genehack commented Aug 1, 2024

Description of proposed changes

This adds a workflow to build a Nextclade tree based on a set of genotype references extracted from two papers.

Related issue(s)

#2

Checklist

  • Checks pass

Copy link

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't completely finished reviewing, but wanted to flag the reference tree only included 122 genomes when I ran this workflow locally. Not sure if that's intentional, but that seems low to me.

nextclade/defaults/auspice_config.json Outdated Show resolved Hide resolved
nextclade/rules/prepare_sequences.smk Outdated Show resolved Hide resolved
@genehack
Copy link
Contributor Author

genehack commented Aug 2, 2024

Haven't completely finished reviewing, but wanted to flag the reference tree only included 122 genomes when I ran this workflow locally. Not sure if that's intentional, but that seems low to me.

It's intentional — the strains in the include file are extracted from the 2 papers that defined the genotypes that are being assigned. The point of this dataset is to bootstrap a genotype-defining tree. The NCBI dataset (which is ~2000 sequences) doesn't have any systematic genotype annotation (at least, that I've been able to find), so these 122 are the only sequences with definitive genotypes.

@genehack genehack force-pushed the add-nextclade-build-2 branch from 7fddc9e to 46acdd7 Compare August 2, 2024 19:51
genehack added a commit to nextstrain/nextclade_data that referenced this pull request Aug 2, 2024
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
Copy link

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of non-blocking comments. Seems to be working well with ~80% clade assignment in the test-output!

nextclade/defaults/auspice_config.json Show resolved Hide resolved
nextclade/defaults/auspice_config.json Outdated Show resolved Hide resolved
nextclade/defaults/auspice_config.json Outdated Show resolved Hide resolved
nextclade/rules/prepare_sequences.smk Outdated Show resolved Hide resolved
nextclade/defaults/config.yaml Outdated Show resolved Hide resolved
nextclade/rules/construct_phylogeny.smk Outdated Show resolved Hide resolved

rule filter:
message: """
Filtering to defined set of strains with known genotypes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threading conversation

Haven't completely finished reviewing, but wanted to flag the reference tree only included 122 genomes when I ran this workflow locally. Not sure if that's intentional, but that seems low to me.

It's intentional — the strains in the include file are extracted from the 2 papers that defined the genotypes that are being assigned. The point of this dataset is to bootstrap a genotype-defining tree. The NCBI dataset (which is ~2000 sequences) doesn't have any systematic genotype annotation (at least, that I've been able to find), so these 122 are the only sequences with definitive genotypes.

Ah gotcha, that's good to know. Looking at the reference tree docs:

The tree should be sufficiently large and diverse to meet clade assignment expectations of a particular use-case, study or experiment.

It's not clear to me if 122 sequences is enough for a reference tree in Nextclade, but maybe we start here and handpick other sequences to include later 🤷‍♀️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if 122 sequences is enough for a reference tree in Nextclade, but maybe we start here and handpick other sequences to include later 🤷‍♀️

nod I know I don't know enough to have an opinion here. I opened a PR to add this to nextclade_data, so perhaps this conversation will happen in that context.

Addressed all your other comments; will plan on merging this early next week sans other feedback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to flag that the group usually has an aversion to neon colors 🙈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a set of "acceptable" colors? When I was using the default ones, because of how they were assigned, it was super hard to distinguish between (e.g.) West Africa I and West Africa II, because they were both shades of dark blue.

I picked these semi-randomly by just shuffling around hex values, but I like that they're all clearly distinguishable to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a set of "acceptable" colors?

I usually just pick colors from Auspice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-08-02 at 15 53 21

.markdownlintrc Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH, markdownlintrc is new to me. Is this going be used in pre-commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't hooked it up there yet, but it could be!

@genehack genehack force-pushed the add-nextclade-build-2 branch 2 times, most recently from 0431a05 to d57d611 Compare August 2, 2024 22:53
genehack added a commit to nextstrain/nextclade_data that referenced this pull request Aug 2, 2024
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
@genehack genehack force-pushed the add-nextclade-build-2 branch from d57d611 to 165eac6 Compare August 5, 2024 17:00
genehack added a commit to nextstrain/nextclade_data that referenced this pull request Aug 5, 2024
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
@genehack genehack merged commit 1c8192b into main Aug 5, 2024
1 check passed
@genehack genehack deleted the add-nextclade-build-2 branch August 5, 2024 17:05
genehack added a commit to nextstrain/nextclade_data that referenced this pull request Sep 12, 2024
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants