DeepVariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. DeepVariant relies on Nucleus, a library of Python and C++ code for reading and writing data in common genomics file formats (like SAM and VCF) designed for painless integration with the TensorFlow machine learning framework.

How to run

We recommend using our Docker solution. The command will look like this:

BIN_VERSION="0.9.0"
sudo docker run \
  -v "YOUR_INPUT_DIR":"/input" \
  -v "YOUR_OUTPUT_DIR:/output" \
  google/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type=WGS \ **Replace this string with exactly one of the following [WGS,WES,PACBIO]**
  --ref=/input/YOUR_REF \
  --reads=/input/YOUR_BAM \
  --output_vcf=/output/YOUR_OUTPUT_VCF \
  --output_gvcf=/output/YOUR_OUTPUT_GVCF \
  --num_shards=$(nproc) **This will use all your cores to run make_examples. Feel free to change.**

For more information, see:

Quick Start
Full documentation list
Best practices for multi-sample variant calling with DeepVariant

How to cite

If you're using DeepVariant in your work, please cite:

A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology 36, 983–987 (2018).
Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T. Afshar, Sam S. Gross, Lizzie Dorfman, Cory Y. McLean, Mark A. DePristo,
doi: https://doi.org/10.1038/nbt.4235

Why Use DeepVariant?

High accuracy - In 2016 DeepVariant won PrecisionFDA Truth Challenge for best SNP Performance. DeepVariant maintains high accuracy across data from different sequencing technologies, prep methods, and species.
Flexibility - Out-of-the-box use for PCR-positive samples and low quality sequencing runs, and easy adjustments for different sequencing technologies and non-human species.
Ease of use - No filtering is needed beyond setting your preferred minimum quality threshold.
Cost effectiveness - With a single non-preemptible n1-standard-16 machine on Google Cloud, it costs ~$9.11 to call a 30x whole genome and ~$0.39 to call an exome. With preemptible pricing, the cost is $2.19 for a 30x whole genome and $0.09 for whole exome (not considering preemption).
Speed - On a 64-core CPU-only machine, DeepVariant completes a 50x WGS in 5 hours and an exome in 16 minutes (1). Multiple options for acceleration exist, taking the WGS pipeline to as fast as 40 minutes (see external solutions).
Usage options - DeepVariant can be run via Docker or binaries, using both on-premise hardware or in the cloud, with support for hardware accelerators like GPUs and TPUs.

(1): Time estimates do not include mapping.

DeepVariant Setup

Prerequisites

Unix-like operating system (cannot run on Windows)
Python 2.7

Official Solutions

Below are the official solutions provided by the Genomics team in Google Brain.

Name	Description
Docker	This is the recommended method.
Build from source	DeepVariant comes with scripts to build it on Ubuntu 14 and 16, with Ubuntu 16 recommended. To build and run on other Unix-based systems, you will need to modify these scripts.
Prebuilt Binaries	Available at `gs://deepvariant/`. These are compiled to use SSE4 and AVX instructions, so you will need a CPU (such as Intel Sandy Bridge) that supports them. You can check the `/proc/cpuinfo` file on your computer, which lists these features under "flags".

External Solutions

The following pipelines are not created or maintained by the Genomics team in Google Brain. Please contact the relevant teams if you have any questions or concerns.

Name	Description
Running DeepVariant on Google Cloud Platform	Docker-based pipelines optimized for cost and speed. Code can be found here.
DeepVariant-on-spark from ATGENOMIX	A germline short variant calling pipeline that runs DeepVariant on Apache Spark at scale with support for multi-GPU clusters (e.g. NVIDIA DGX-1).
Parabricks	An accelerated DeepVariant pipeline with multi-GPU support that runs our WGS pipeline in just 40 minutes, at a cost of $2-$3 per sample. This provides a 7.5x speedup over a 64-core CPU-only machine at lower cost.
DNAnexus DeepVariant App	Offers parallelized execution with a GUI interface (requires platform account).
Nextflow Pipeline	Offers parallel processing of multiple BAMs and Docker support.
DNAstack Pipeline	Cost-optimized DeepVariant pipeline (requires platform account).

Additional References

DeepVariant Blog
DeepVariant release notes

Contribution Guidelines

Please open a pull request if you wish to contribute to DeepVariant. Note, we have not set up the infrastructure to merge pull requests externally. If you agree, we will test and submit the changes internally and mention your contributions in our release notes. We apologize for any inconvenience.

If you have any difficulty using DeepVariant, feel free to open an issue. If you have general questions not specific to DeepVariant, we recommend that you post on a community discussion forum such as BioStars.

License

BSD-3-Clause license

Acknowledgements

DeepVariant happily makes use of many open source packages. We would like to specifically call out a few key ones:

Boost Graph Library
abseil-cpp and abseil-py
CLIF
GNU Parallel
htslib & samtools
Nucleus
numpy
SSW Library
TensorFlow and Slim

We thank all of the developers and contributors to these packages for their work.

Disclaimer

This is not an official Google product.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DeepVariant

How to run

How to cite

Why Use DeepVariant?

DeepVariant Setup

Prerequisites

Official Solutions

External Solutions

Additional References

Contribution Guidelines

License

Acknowledgements

Disclaimer

Files

README.md

Latest commit

History

README.md

File metadata and controls

DeepVariant

How to run

How to cite

Why Use DeepVariant?

DeepVariant Setup

Prerequisites

Official Solutions

External Solutions

Additional References

Contribution Guidelines

License

Acknowledgements

Disclaimer