Skip to content

Commit

Permalink
Merge pull request #6 from ENCODE-DCC/dev_normalization_robust_min_max
Browse files Browse the repository at this point in the history
leaderboard
  • Loading branch information
leepc12 authored Jul 23, 2019
2 parents 90d1d4f + 73ee068 commit a1a8943
Show file tree
Hide file tree
Showing 14 changed files with 1,462 additions and 925 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ join.log
*.npy
*.npz
*.db
*.tsv
__pycache__
round1
*.pyc

112 changes: 64 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,25 @@
# ENCODE Imputation Challenge Scoring and Validation Scripts
# ENCODE Imputation Challenge Scoring/Ranking and Validation Scripts

## Installation

1) [Install Conda](https://docs.conda.io/en/latest/miniconda.html) first.
1) [Install Conda 4.6.14](https://docs.conda.io/en/latest/miniconda.html) first. Answer `yes` to all Y/N questions. Use default installation paths. Re-login after installation.
```bash
$ wget https://repo.anaconda.com/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh
$ bash Miniconda3-4.6.14-Linux-x86_64.sh
```

2) Install `numpy`, `scikit-learn` and `pyBigWig`.
```bash
$ conda install -c bioconda numpy scikit-learn pyBigWig sqlite scipy
$ conda install -y -c bioconda numpy scikit-learn pyBigWig sqlite scipy
```

## Example (hg38)
## Validating a submission

```bash
$ python validate.py [YOUR_SUBMISSION_BIGWIG]
```

## Scoring a submission

1) Download ENCFF622DXZ and ENCFF074VQD from ENCODE portal.
```bash
Expand All @@ -18,78 +28,84 @@
$ wget https://www.encodeproject.org/files/ENCFF074VQD/@@download/ENCFF074VQD.bigWig
```

2) Convert it to numpy array.
2) Convert it to numpy array. This is to speed up scoring multiple submissions. `score.py` can also take bigwigs so you can skip this step.
```bash
$ python build_npy_from_bigwig.py test/hg38/ENCFF622DXZ.bigWig
$ python build_npy_from_bigwig.py test/hg38/ENCFF074VQD.bigWig
$ python bw_to_npy.py test/hg38/ENCFF622DXZ.bigWig
$ python bw_to_npy.py test/hg38/ENCFF074VQD.bigWig
```

3) Run it. If you score without a variance `.npy` file specified as `--var-npy`, then `msevar` metric will be `0.0`.
```bash
$ python score.py test/hg38/ENCFF622DXZ.npy test/hg38/ENCFF074VQD.npy \
--chrom chr20 --out-file test/hg38/ENCFF622DXZ.ENCFF074VQD.score.txt
```

4) Output looks like: (header: bootstrap_index, mse, mse1obs, mse1imp, gwcorr, match1, catch1obs, catch1imp, aucobs1, aucimp1, mseprom, msegene, mseenh).
```bash
bootstrap_-1 20.45688606636623 1730.3503548526915 195.52252657980728 0.01705378703206674 848 3462 2976 0.5852748736100822 0.590682173511888 376.1018309950674 31.24613030186926 94.01719916101615
$ python score.py test/hg38/ENCFF622DXZ.npy test/hg38/ENCFF074VQD.npy --chrom chr20
```


## Validation for submissions

In order to validate your BIGWIG. Use `validate.py`.

```bash
$ python validate.py [YOUR_SUBMISSION_BIGWIG]
```

## Ranking for submissions

1) [Generate bootstrap label](#how-to-generate-bootstrap-labels)

2) In order to speed up scoring, convert `TRUTH_BIGWIG` into numpy array/object (binned at `25`). Repeat this for each pair of cell type and assay.
1) Create a score database.
```bash
$ python build_npy_from_bigwig.py [TRUTH_BIGWIG] --out-npy-prefix [TRUTH_NPY_PREFIX]
$ python db.py [NEW_SCORE_DB_FILE]
```

3) Create a score database.
2) In order to speed up scoring, convert `TRUTH_BIGWIG` into numpy array/object (binned at `25`). Repeat this for each pair of cell type and assay. `--out-npy-prefix [TRUTH_NPY_PREFIX]` is optional. Repeat this for all truth bigwigs.
```bash
$ python create_db.py [SCORE_DB_FILE]
$ python bw_to_npy.py [TRUTH_BIGWIG] --out-npy-prefix [TRUTH_NPY_PREFIX]
```

4) For each assay type, build a variance `.npy` file, which calculates a variance for each bin for each chromosome across all cell types. Without this variance file, `msevar` will be `0.0`.
$ python build_var_npy.py [TRUTH_NPY_CELL1] [TRUTH_NPY_CELL2] ... \
--out-npy-prefix var_[ASSAY_OR_MARK_ID]
3) For each assay type, build a variance `.npy` file, which calculates a variance for each bin for each chromosome across all cell types. Without this variance file, `msevar` will be `0.0`.
```bash
$ python build_var_npy.py [TRUTH_NPY_CELL1] [TRUTH_NPY_CELL2] ... --out-npy-prefix var_[ASSAY_OR_MARK_ID]
```

5) Score each submission with bootstrap labels. `--validated` is only for validated submissions binned at `25`. With this flag turned on, `score.py` will skip interpolation of intervals in a bigwig. For ranking, you need to define all metadata for a submission like `--cell [CELL_ID] --assay [ASSAY_OR_MARK_ID] -t [TEAM_ID_INT] -s [SUBMISSION_ID_INT]`. These values will be written to a database file together with bootstrap scores. Repeat this for each submission (one submission per team for each pair of cell type and assay).
4) Score each submission. `--validated` is only for a validated bigwig submission binned at `25`. With this flag turned on, `score.py` will skip interpolation of intervals in a bigwig. For ranking, you need to define metadata for a submission like -t [TEAM_ID_INT] -s [SUBMISSION_ID_INT]`. These values will be written to a database file together with bootstrap scores. Repeat this for each submission (one submission per team for each pair of cell type and assay).
```bash
$ python score.py [YOUR_VALIDATED_SUBMISSION] [TRUTH_NPY] \
--bootstrapped-label-npy [BOOTSTRAP_LABEL_NPY] \
--var-npy var_[ASSAY_OR_MARK_ID].npy
--out-db-file [SCORE_DB_FILE] \
--cell [CELL_ID] --assay [ASSAY_OR_MARK_ID] \
-t [TEAM_ID_INT] -s [SUBMISSION_ID_INT] \
--validated
$ python score.py [YOUR_VALIDATED_SUBMISSION_BIGWIG_OR_NPY] [TRUTH_NPY] \
--var-npy var_[ASSAY_OR_MARK_ID].npy \
--db-file [SCORE_DB_FILE] \
--validated \
-t [TEAM_ID_INT] -s [SUBMISSION_ID_INT]
```

5) Calculate ranks based on DB file
```bash
$ python rank.py [SCORE_DB_FILE]
```

## Setting up a leaderboard server (admins only)

## For challenge admins
1) Create a server instance on AWS.

### How to generate bootstrap labels?
2) Install Synapse client.
```bash
$ pip install synapseclient
```

Download `submission_template.bigwig` from Synapse imputation challenge site. The following command will make 10-fold (default) bootstrap index for each chromosome. Output is a single `.npy` file which have all bootstrap labels for corresponding bootstrap index and chromosomes.
3) Authenticate yourself on the server
```bash
$ synapse login --remember-me -u [USERNAME] -p [PASSWORD]
```

```bash
$ python build_bootstrapped_label.py submission_template.bigwig
```
4) Create a score database.
```bash
$ python db.py [NEW_SCORE_DB_FILE]
```

### How to use bootstrapped label?
5) Run `score_leaderboard.py`. Files on `TRUTH_NPY_DIR` should be like `CXXMYY.npy`. Files on `VAR_NPY_DIR` should be like `var_MYY.npy`. Submissions will be downloaded on `SUBMISSION_DOWNLOAD_DIR`.
```bash
$ NTH=3 # number of threads to parallelize bootstrap scoring
$ python score_leaderboard.py [EVALUATION_QUEUE_ID] [TRUTH_NPY_DIR] \
--var-npy-dir [VAR_NPY_DIR] \
--submission-dir [SUBMISSION_DOWNLOAD_DIR] \
--send-msg-to-admin \
--send-msg-to-user \
--db-file [SCORE_DB_FILE] \
--nth $NTH \
--project-id [SYNAPSE_PROJECT_ID] \
--leaderboard-wiki-id [LEADERBOARD_WIKI_ID] \
--bootstrap-chrom chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr6,chr7,chr8,chr9,chrX chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr16,chr17,chr18,chr19,chr2,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chrX chr1,chr10,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr19,chr2,chr20,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX
```

Simply run `score.py` with `--bootstrapped-label-npy bootstrapped_label.npy`.
Example:
```bash
$ python score_leaderboard.py $EVAL_Q_ID /mnt/imputation-challenge/output/score_robust_min_max/validation_data_npys --var-npy-dir /mnt/imputation-challenge/output/score_robust_min_max/var_npys --submission-dir /mnt/imputation-challenge/data/submissions/round2 --db-file $DB --nth $NTH --bootstrap-chrom chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr6,chr7,chr8,chr9,chrX chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr16,chr17,chr18,chr19,chr2,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chrX chr1,chr10,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr19,chr2,chr20,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX --send-msg-to-admin --send-msg-to-user --team-name-tsv data/team_name_round1.tsv
```

210 changes: 0 additions & 210 deletions build_npy_from_bigwig.py

This file was deleted.

Loading

0 comments on commit a1a8943

Please sign in to comment.