Using Google deployments to run Supernova assemblies at scale using Cromwell workflows.
Supernova is a software package for de novo assembly from Chromium Linked-Reads. These reads have an additional encoding that helps with creating diploid assemblies, representing maternal and paternal haplotypes.
- compute/storage resources
- minimum compute resources are 32 cores and 256 GB
- intermediate storage required is > 1TB
- scale to many assemblies
- MGI produces 8+ at a time
- automate the assembly process
- in-house code and a workflow system
Scripts and CLI to manage 10X resources and run the assembly pipeline. Also included Slack notification when pipelines start and finish.
Cromwell is a Workflow Management System geared towards scientific workflows.
- download read fastqs
- assemble
- create output fastqs
- upload assembly to cloud storage
Create and manage cloud resources with simple templates.
- Jinja template and schema
- Startup script
- YAML config
Jinja is a modern and designer-friendly templating language for Python, modelled after Django’s templates.
- Jinja file describes cloud resources
- Schema details the config properties
Installs...
- supernova (from cloud location)
- cromwell
- in-house code (from github)
- additional Linux tools
Includes the necessary properties to configure the deployment. Included are project specific items, plus data and supernova cloud locations.
Fingers crossed :)