Skip to content

Commit

Permalink
small clarification on install.md
Browse files Browse the repository at this point in the history
  • Loading branch information
akotlar committed Sep 25, 2024
1 parent f132917 commit cef3d9d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion perl/INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ Explanation of the output:
- `my_annotation.annotation.header.json`: The header of the annotated dataset
- `my_annotation.sample_list`: The list of samples in the annotated dataset
- `my_annotation.annotation.tsv.gz`: A gzipped TSV file with one row per variant and one column per annotation
- `my_annotation.annotation.tsv.gz`: A block gzipped TSV file with one row per variant and one column per annotation. Can be decompressed with `bgzip` or any program compatible with the gzip format, like `gzip` and `pigz`.
- `my_annotation.dosage.feather`: The dosage matrix file, where the first column is the `locus` column in the format "chr:pos:ref:alt", and columns following that are sample columns, with the dosage of the variant for that sample (0 for homozygous reference, 1 for 1 copy of the alternate allele, 2 for 2, and so on). -1 indicates missing genotypes. The dosage is the expected number of alternate alleles, given the genotype. This is useful for downstream analyses like imputation, or for calculating polygenic risk scores
- This file is in the [Arrow feather format](https://arrow.apache.org/docs/python/feather.html), also known as the "IPC" format. This is an ultra-efficient format for machine learning, and is widely supported, in Python libraries like [Pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_feather.html), [Polars](https://docs.pola.rs/api/python/stable/reference/api/polars.read_ipc.html), [PyArrow](https://arrow.apache.org/docs/python/generated/pyarrow.feather.read_feather.html), as well as languages like [R](https://arrow.apache.org/docs/r/reference/read_feather.html) and [Julia](https://github.com/apache/arrow-julia)
- `hg38.yml`: The configuration file used for the annotation. You can use this to either re-build the Bystro database from scratch, or to re-run the annotation with the same configuration
Expand Down

0 comments on commit cef3d9d

Please sign in to comment.