Bystro

Using Bystro

For most users, we recommend https://bystro.io .

The web app gives full access to all of Bystro's capabilities, provides a convenient search/filtering interface, supports large data sets (tested up to 890GB uncompressed/129GB compressed), and has excellent performance.

Installing Bystro

Follow the instructions in INSTALL.md

Bystro relies on pluggable (via Bystro's YAML config) pre-processors to normalize variant inputs (dealing with VCF issues such as padding), calculate whether a site is a transition or transversion, calculate sample maf, identify hets/homozygotes/missing samples, calculate heterozygosity, homozygosity, missingness, and more.

VCF format: Bystro-Vcf
SNP format: Bystro-SNP
Create your own to support other formats!

Annotation (Output) Field Descriptions

Please read FIELDS.md

The Bystro configuration file

The config file describes the state of both the database and the annotation. It's required for annotating or building
It has several keys:
- tracks: The highest level organization for database values. Tracks have a name property, which must be unique, and a type, which must be one of:
  - sparse: Any bed file, or any file that can be mapped to chrom, chromStart, and chromEnd columns.
    - This is used for dbSNP, and Clinvar records, but many files can be fit this format.
    - Mapping fields can be managed by the fieldMap key
  - score: Accepts any wigFix file.
    - Used for phastCons, phyloP
  - cadd:
    - Accepts any CADD file, or Bystro's custom "bed-like" CADD file, which has 2 header lines, and chrom, chromStart, chromEnd columns, followed by standard CADD fields
    - CADD format: http://cadd.gs.washington.edu
  - gene: A UCSC gene track field (ex: knownGene, refGene, sgdGene).
    - The local_files for this are created using an sql_statement
    - Ex: SELECT * FROM hg38.refGene LEFT JOIN hg38.kgXref ON hg38.kgXref.refseq = hg38.refGene.name
- chromosomes: The allowable chromosomes.
  - Each row of every track must be identified by these chromosomes (during building)
  - Each row of any input file submitted for annotation must also be "" "" (during annotation)
  - However, Bystro is flexible about the chr prefix
  Ex: For the following config
```
chromosomes:
- chr1
- chr2
- chr3
```
  Only chr1, chr2, and chr3 will be accepted. However, Bystro tries to make your life easy
  1. We currently follow UCSC coneventions for chromosomes, meaning they should be prepended by chr
  2. Bystro will automatically append chr to chromosomes read from an input file during annotation.
  3. Bystro allows the transformation of any field during building, configurable in the YAML config file for that assembly, making it easy to prepend chr to the source file chromosome field
  Ex: Clinvar doesn't have a chr prefix, so during building we specify:
```
tracks:
  - name: clinvar
    build_field_transformations:
      chrom: chr .
    fieldMap:
      Chromosome: chrom
```
  Here fieldMap allows us to rename header fields, and build_field_transformations allows us to define a prepend operation (chr . can be interpreted as the perl command "chr" . $chrom)
  
  So: input files do not need to have their chromosomes prepended by chr. Bystro will normalize the name.
  
  In this example chromosomes 1 and chr1 will be built/annotated, but 1_rand will not.

Directories and Files

These describe where the Bystro database and any source files are located.

files_dir : The parent folder within which each track's local_files are located

Bystro automatically checks for local_files at parent/trackName/file

Ex: For the config file containing
```
files_dir: /path/to/files/
track:
  - name: refSeq
    local_files:
      - hg19.refGene.chr1.gz
      # and more files
```
Bystro will expect files in /path/to/files/refSeq/hg19.refGene.chr1.gz

database_dir : Each database is held within database_dir, in a folder of the name assembly

Ex: For the config file containing
```
assembly: hg19
database_dir: /path/to/databases/
```
Bystro will look for the database /path/to/databases/hg19

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
bench		bench
bin		bin
config		config
lib		lib
t		t
.gitignore		.gitignore
BUILD.md		BUILD.md
Changes.md		Changes.md
FIELDS.md		FIELDS.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
install-perl-libs.sh		install-perl-libs.sh
install-rpm.sh		install-rpm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bystro

Using Bystro

Installing Bystro

Annotation (Output) Field Descriptions

The Bystro configuration file

Directories and Files

About

Releases

Packages

Languages

License

wingolab-org/bystro

Folders and files

Latest commit

History

Repository files navigation

Bystro

Using Bystro

Installing Bystro

Annotation (Output) Field Descriptions

The Bystro configuration file

Directories and Files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages