Skip to content
This repository has been archived by the owner on Aug 4, 2020. It is now read-only.

Consider using a richer dataset? #46

Open
naupaka opened this issue Aug 2, 2017 · 2 comments
Open

Consider using a richer dataset? #46

naupaka opened this issue Aug 2, 2017 · 2 comments

Comments

@naupaka
Copy link
Member

naupaka commented Aug 2, 2017

Having a richer dataset to analyze might allow us to better showcase the abilities of dplyr and ggplot. One to consider might be a csv version of the results of the VCF pipeline. We used this approach in a recent workshop at Stanford (script: parse_vcf.R, csv: all_vcf.csv). The R script and the csv it outputs are available here.

@tracykteal
Copy link
Contributor

That's a great idea! I think @JasonJWilliamsNY had also tried something like this. We definitely should do this.

@naupaka
Copy link
Member Author

naupaka commented Aug 2, 2017

I should also mention that the Rmd template file I mentioned on issue #47 also has some lines of code to parse out some of the information out of the nasty-looking INFO column so there's even more to work with. I tried a couple different VCF libraries to try and get things into a workable, tidy form - there is a vcf2tidy function in the vcfR package, but I was unable to get it to work/do what I wanted, hence the hackish way we ended up doing it. vcf2tidy would be the most straightforward approach to produce this dataset I think, if we could make it work.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants