Consider using a richer dataset? #46

naupaka · 2017-08-02T18:43:56Z

Having a richer dataset to analyze might allow us to better showcase the abilities of dplyr and ggplot. One to consider might be a csv version of the results of the VCF pipeline. We used this approach in a recent workshop at Stanford (script: parse_vcf.R, csv: all_vcf.csv). The R script and the csv it outputs are available here.

The text was updated successfully, but these errors were encountered:

tracykteal · 2017-08-02T18:52:05Z

That's a great idea! I think @JasonJWilliamsNY had also tried something like this. We definitely should do this.

naupaka · 2017-08-02T18:57:42Z

I should also mention that the Rmd template file I mentioned on issue #47 also has some lines of code to parse out some of the information out of the nasty-looking INFO column so there's even more to work with. I tried a couple different VCF libraries to try and get things into a workable, tidy form - there is a vcf2tidy function in the vcfR package, but I was unable to get it to work/do what I wanted, hence the hackish way we ended up doing it. vcf2tidy would be the most straightforward approach to produce this dataset I think, if we could make it work.

JasonJWilliamsNY mentioned this issue Aug 31, 2017

Add lesson on Bioconductor #67

Open

JasonJWilliamsNY added after-lesson-release complex enhancement future goal labels Oct 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using a richer dataset? #46

Consider using a richer dataset? #46

naupaka commented Aug 2, 2017

tracykteal commented Aug 2, 2017

naupaka commented Aug 2, 2017 •

edited

Loading

Consider using a richer dataset? #46

Consider using a richer dataset? #46

Comments

naupaka commented Aug 2, 2017

tracykteal commented Aug 2, 2017

naupaka commented Aug 2, 2017 • edited Loading

naupaka commented Aug 2, 2017 •

edited

Loading