Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ingest #7

Merged
merged 2 commits into from
Jul 11, 2024
Merged

Update ingest #7

merged 2 commits into from
Jul 11, 2024

Conversation

genehack
Copy link
Contributor

No description provided.

@genehack genehack marked this pull request as ready for review July 10, 2024 00:24
@genehack genehack requested a review from a team July 10, 2024 00:25
ingest/Snakefile Outdated Show resolved Hide resolved
ingest/config/config.yaml Outdated Show resolved Hide resolved
@corneliusroemer corneliusroemer changed the title Update injest Update ingest Jul 10, 2024
@corneliusroemer
Copy link
Member

@genehack did you name this PR injest in jest? 😄 I renamed to the more canonical ingest, I hope you forgive me :)

@genehack
Copy link
Contributor Author

@genehack did you name this PR injest in jest? 😄 I renamed to the more canonical ingest, I hope you forgive me :)

thanks, no, not intentionally, my brain gest 😁 really wants that soft g to be a j ...

@genehack
Copy link
Contributor Author

Bringing comments from @jameshadfield over from Slack for ease of change-tracking:

Notes after briefly playing with the tree:

  • If you can track down an annotation (of the CDSs after cleavage from the polyprotein) that would really enhance the diversity panel
  • Looks like there may be seven genotypes in the literature: "West African genotypes I and II, East African genotype, East/central African genotype, Angolan genotype, and South American genotypes I and II" . Assuming these fall nicely on the tree (and they should - they're actual genotypes not serotypes) then labelling these via augur clades would be helpful.
  • The clock signal looks weak (view the tree in "clock" layout). Some of the clades look like they may have a stronger signal, so potentially we could run it for individual genotypes. There may be interesting biology here as the rates look different in different clades, but I haven't read the literature, this is only from interacting with the tree.
  • Assuming we remove the timetree, we can still include isolation date as metadata, but it doesn't work well as a colouring (Auspice shortcoming) - in the past I've encoded a continuous colouring of year which Auspice does display nicely.
  • I think it's valid to run augur traits on region. Potentially country too, but that looks like there's more sampling bias (or maybe not, I don't know the prevalence).
  • [minor] lat/longs missing for Guinea-Bissau, Côte d'Ivoire

@genehack
Copy link
Contributor Author

genehack commented Jul 10, 2024

  • If you can track down an annotation (of the CDSs after cleavage from the polyprotein) that would really enhance the diversity panel

Yeah, I found the RefSeq for yellow fever virus and it's got a complete breakdown, so I can do this.

👍

  • The clock signal looks weak (view the tree in "clock" layout). Some of the clades look like they may have a stronger signal, so potentially we could run it for individual genotypes. There may be interesting biology here as the rates look different in different clades, but I haven't read the literature, this is only from interacting with the tree.

I'm hearing "maybe down the road but not critical path" here, push back if that's wrong.

  • Assuming we remove the timetree, we can still include isolation date as metadata, but it doesn't work well as a colouring (Auspice shortcoming) - in the past I've encoded a continuous colouring of year which Auspice does display nicely.

kk.

  • I think it's valid to run augur traits on region. Potentially country too, but that looks like there's more sampling bias (or maybe not, I don't know the prevalence).

👍

  • [minor] lat/longs missing for Guinea-Bissau, Côte d'Ivoire

yep, saw that error during build, thanks for highlight.

@jameshadfield
Copy link
Member

Some of the clades look like they may have a stronger signal, so potentially we could run it for individual genotypes.

I'm hearing "maybe down the road but not critical path" here, push back if that's wrong.

💯 - not something for the initial build!

Looking at this again today there are signs of recombination in the tree, although the literature suggest[s] that recombination between YFV strains is unlikely. The high number of homoplasies on the deep branches, and the clustering of them, sure looks consistent with recombination - e.g. 5389G,5437T,5443T,5485A,5497T. If this is more prevalent deeper in the tree it would result in less clock signal, or said another way the clock signal might still be ok for individual YF-genotypes.

It'd be useful if Auspice were able to show a colouring for unique mutation counts, homoplasic counts and the ratio. I'll look into this at some point. Could even use these as branch metrics if we get around to nextstrain/auspice#1769.

* sync `ingest/README` with `seasonal-cov` version
* move from `defaults` to `config` for config files
* strip fetch-from-entrez stuff from config and rules files
* add `ncbi_taxon_id` to config
* strip guidance comments, light reformat of Snakemake and rules files
  for readability
* add benchmarks where missing
* remove unused nextclade bits
* add "clean" convenience rule
@genehack genehack requested review from joverlee521 and j23414 July 11, 2024 00:37
@genehack genehack mentioned this pull request Jul 11, 2024
30 tasks
Copy link
Contributor

@j23414 j23414 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ingest again, works for me!

@genehack genehack merged commit 28d3127 into main Jul 11, 2024
2 checks passed
@joverlee521 joverlee521 deleted the update-injest-2 branch July 11, 2024 17:21
@joverlee521
Copy link

Just noticed this repo doesn't auto-delete merged branches

Clicked the little box for this repo
Screenshot 2024-07-11 at 10 21 48 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants