-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subsection on spruce case study (DRAFT) #176
Conversation
Looks great, thanks! A few quick notes (I'm afk)
Your storage is dominated by pl, so we should look at using local alleles to see if that reduces much. This needs dev version of bio2zarr |
Why only store PL and not other call fields? I would have thought you'd drop PL and keep the others as it's huge and not especially useful. We'd need to explain that point a bit more. |
Well, it was my colleague who did the variant calling. I think the main reason was to use the PL fields for downstream analyses with ANGSD, which was the goal at the time. |
Will do! On that note, is there an option to output vcf2zarr inspect results as csv? Parsing the text output with, e.g., R's read.table(sep="\t") does not work, or am I missing something?
Yes.
Are you referring to sgkit-dev/bio2zarr#285 here? |
Ok, now I see it, the inspect file is generated in the notebook! |
Looks good, I'm happy to merge and iterate if you are? |
Great! I was looking into the coordinate conversion issue today, hopefully I can have it done come Wednesday. I was thinking I'd add a paragraph and a notebook to describe the process briefly. |
Details are now taken from notebooks
Addresses #169.
I have added a subsection with text and table on the spruce case study, modeled on the Genomics England text. I'm currently working on converting coordinate system as a demonstration of some of the advantages that Zarr provides compared to VCF (coordinate overflow problem). I added some comments in the text regarding some of the presented numbers relating to file and chunk sizes as I'm not always sure what conventions you have used (e.g., du -hs or summing up stats in the inspect files?).
@jeromekelleher is this too verbose, or is it what you had in mind?