Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling with Avocado using the "hive" range partitioned data #283

Open
jpdna opened this issue Jan 6, 2018 · 0 comments
Open

Calling with Avocado using the "hive" range partitioned data #283

jpdna opened this issue Jan 6, 2018 · 0 comments

Comments

@jpdna
Copy link
Member

jpdna commented Jan 6, 2018

Hi @fnothaft -
I'd like to demonstrate joint calling of genotypes using Avocado for a specific genomics regions using the bin "hive-style" partitioned data.
Input:

  1. gVCF files for 10+ for 100s of samples saved as the bin range partitioned ADAM parquet datasets
  2. bam files saved as ADAM bin partitioned datasets.

The application here I imagine is where there was a desire for on-the-fly recalling of a specific region in a case where new samples are added and a set of candidate regions
need to be examined in near real-time. This would include a feature allowing user to provide a BED file of region to calling, as genotypeGVCFs allows for in GATK/Haplotypecaller.

My plan is to make Avocado be able to load partitioned data from my ADAM "hive" binned dataset branch, and with that I think it will just work, and I'll measure performance.
Let me know if you have suggestions / comments about the usefulness of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant