Calling with Avocado using the "hive" range partitioned data #283

jpdna · 2018-01-06T20:02:39Z

Hi @fnothaft -
I'd like to demonstrate joint calling of genotypes using Avocado for a specific genomics regions using the bin "hive-style" partitioned data.
Input:

gVCF files for 10+ for 100s of samples saved as the bin range partitioned ADAM parquet datasets
bam files saved as ADAM bin partitioned datasets.

The application here I imagine is where there was a desire for on-the-fly recalling of a specific region in a case where new samples are added and a set of candidate regions
need to be examined in near real-time. This would include a feature allowing user to provide a BED file of region to calling, as genotypeGVCFs allows for in GATK/Haplotypecaller.

My plan is to make Avocado be able to load partitioned data from my ADAM "hive" binned dataset branch, and with that I think it will just work, and I'll measure performance.
Let me know if you have suggestions / comments about the usefulness of this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling with Avocado using the "hive" range partitioned data #283

Calling with Avocado using the "hive" range partitioned data #283

jpdna commented Jan 6, 2018

Calling with Avocado using the "hive" range partitioned data #283

Calling with Avocado using the "hive" range partitioned data #283

Comments

jpdna commented Jan 6, 2018