-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug?] Galago doesn't like UShER JSON (yet) #204
Comments
Ah! I somehow didn't get a notification for this issue. Thanks so much for investigating, @AngieHinrichs ! I just pushed a PR to our staging server to make all geographical data optional, and I'm mostly able to load your file via This is easy to fix on my end. I'll get this up and running on prod by early next week at the latest and let you know as soon as it's ready. Thanks again! So excited :) |
Great! Yeah, UShER JSON doesn't have all of the cool stuff that Augur JSON does, but I'm glad you can work with it anyway! Looking forward to adding a linkout. 😄 |
@AngieHinrichs -- I haven't forgotten about this! Got unexpectedly slammed with a few other things this week. Next week is looking wide open, though, and this is top of my list. Thanks for your patience. |
No worries, same here! :) (except not sure about next week) No pressure from my side. It will be easy to add a linkout whenever. |
Hey @AngieHinrichs! At long last (apologies -- covid finally found us after 3 yrs), I've got a fix for this. The issue was indeed parsing dates on our end. We now accept either My current patch of leaving these samples out might not be a great solution for datasets with more than a few missing dates, though. Do most UShER samples come in with dates, or is it common to have a significant percentage of samples without? |
Hi Sidney! So sorry to hear about the covid, but good job avoiding it for so long. Glad you're back in dev-land. The "UShER samples" are a mix of sequences from INSDC (GenBank, ENA, DDBJ) and/or GISAID (many sequences are in both and I attempt to de-duplicate). Most of them have dates, but not all, and some (by law in some locations) are year-month-only unfortunately. If it turns out to be a big problem then there are several things we could try, such as suggesting that people choose a larger subtree size in UShER to send onward to you so there's more margin for having to discard some samples. Is there an optimal range of sizes for Galago input trees? Does it depend on the number of the user's samples of interest? I imagine some users might upload a handful of sequences from an outbreak that probably fall into one or two subtrees, while others might have hundreds of sequences from a week's worth of runs in their lab (potentially resulting in many subtrees). The UShER web interface's default subtree size is 50 which is OK for finding the few most closely related sequences, but for other purposes like evaluating a possible new lineage for pangolin, 1000 is a better size. The max is 5000. |
Glad to be back! Although I've got a lot of foggy brain still, so lmk if any of this doesn't make sense :) We can accommodate any of those tree sizes, although performance is best at <3000-3500ish. In an ideal world, I'd recommend something along the lines of:
|
@AngieHinrichs -- another idea we could think about at some point -- Galago helps the user find which clade(s) to generate a report for based on their samples of interest. It could be useful to pass through the names of their input samples via query param, although this could very quickly get too long and cumbersome to be functional. Would need to noodle on this a bit more. |
Great about the sample size flexibility.
Yeah. Maybe in a text file alongside the JSON file that has the tree? One name per line? Or -- actually they can be extracted from the JSON itself, filter nodes for userOrOld == "uploaded sample" if there's already a convenient way to do that. |
Oooh good call. Yeah as long as there's a metadata trait in the JSON I can
parse them on ingest.
…On Wed, Oct 19, 2022 at 3:51 PM Angie Hinrichs ***@***.***> wrote:
Great about the sample size flexibility.
It could be useful to pass through the names of their input samples via
query param, although this could very quickly get too long and cumbersome
to be functional.
Yeah. Maybe in a text file alongside the JSON file that has the tree? One
name per line? Or -- actually they can be extracted from the JSON itself,
filter nodes for userOrOld == "uploaded sample" if there's already a
convenient way to do that.
—
Reply to this email directly, view it on GitHub
<#204 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADAIYXYWLSYMOV36LDRDTFDWEB3QXANCNFSM6AAAAAAQYLJ7DM>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
Hi @sidneymbell -- sorry I let this all get buried in my inbox for, yikes! almost a year! 🤯 But I would still like to link out to Galago. This is the linkout format that I have: but when I try that I get an error message: Javascript console says
I can view https://hgwdev.gi.ucsc.edu/~angie/test_UShER_MicrobeTrace.json in my web browser and see its response headers with curl:
? If you don't have time to work on this, no problem! Just wanted to let you know I'm still interested if you do have time. 🙂 |
Describe the bug
This may be a bug in the JSON produced by the UShER web interface, not Galago, but they're not working together yet so let's figure it out.
Expected behavior / How to reproduce
This URL contains an Auspice V2 tree produced by an UShER web interface query:
https://genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_16aa0_445360.json
[unfortunately that is a temporary file, note the "trash" in the name -- it will go away in a couple days, so I have saved a copy here: https://hgwdev.gi.ucsc.edu/~angie/XAY_XBA_XBC_2022-09-28.json ]
So I hoped this Galago Fetch URL would work:
https://galago.czgenepi.org/#/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_16aa0_445360.json
But I get an error "Woops! Error fetching tree file
We weren't able to import your tree data. Please confirm your URL is correct and publicly accessible, or upload your JSON file directly below."
Interestingly, I do get farther with fetch if I use the backup copy on a different server:
https://galago.czgenepi.org/#/fetch/hgwdev.gi.ucsc.edu/~angie/XAY_XBA_XBC_2022-09-28.json
-- that gets me as far as the "Analyze your data in Galago" dialog, where I can choose the pathogen (SARS-CoV-2) -- but I can't choose a State/Province, probably because my JSON has only the country level. There is a drop-down for State/Province, but it has no values.
Would it be possible to use the country metadata instead if the state metadata is missing from the JSON?
The text was updated successfully, but these errors were encountered: