Skip to content

Commit

Permalink
Move phylogenetic workflow from top-level to folder phylogenetic (#198
Browse files Browse the repository at this point in the history
)

* Move phylogenetic workflow from top-level to folder `phylogenetic`

* wip: use the experimental workflow from nextstrain/.github#57
  • Loading branch information
corneliusroemer authored Sep 26, 2023
1 parent dddf628 commit 03f9a25
Show file tree
Hide file tree
Showing 51 changed files with 125 additions and 86 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@ on:

jobs:
pathogen-ci:
uses: nextstrain/.github/.github/workflows/pathogen-repo-ci.yaml@master
uses: nextstrain/.github/.github/workflows/pathogen-repo-ci.yaml@dec0880059017dac7facf100435c5737bf1386c8
with:
workflow-root: phylogenetic


lint:
runs-on: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/rebuild-hmpxv1-big.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
--env GITHUB_RUN_ID \
--env SLACK_TOKEN \
--env SLACK_CHANNELS \
. \
phylogenetic \
notify_on_deploy \
--configfiles config/hmpxv1_big/config.yaml config/nextstrain_automation.yaml \
--config auspice_prefix=$TRIAL_NAME
2 changes: 1 addition & 1 deletion .github/workflows/rebuild-hmpxv1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
--env GITHUB_RUN_ID \
--env SLACK_TOKEN \
--env SLACK_CHANNELS \
. \
phylogenetic \
notify_on_deploy \
--configfiles config/hmpxv1/config.yaml config/nextstrain_automation.yaml \
--config auspice_prefix=$TRIAL_NAME
2 changes: 1 addition & 1 deletion .github/workflows/rebuild-mpxv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
--env GITHUB_RUN_ID \
--env SLACK_TOKEN \
--env SLACK_CHANNELS \
. \
phylogenetic \
notify_on_deploy \
--configfiles config/mpxv/config.yaml config/nextstrain_automation.yaml \
--config auspice_prefix=$TRIAL_NAME
98 changes: 16 additions & 82 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,106 +1,40 @@
# nextstrain.org/monkeypox
# Nextstrain repository for mpox virus

This is the [Nextstrain](https://nextstrain.org) build for MPXV (mpox virus). Output from this build is visible at [nextstrain.org/monkeypox](https://nextstrain.org/monkeypox).
The lineages within the recent mpox outbreaks in humans are defined in a separate [lineage-designation repository](https://github.com/mpxv-lineages/lineage-designation).
This repository contains two workflows for the analysis of mpox virus (MPXV) data:

## Software requirements
- `ingest/` - Download data from GenBank, clean and curate it and upload it to S3
- `phylogenetic/` - Make phylogenetic trees for nextstrain.org

Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools.
Each folder contains a README.md with more information.

## Usage
## CI

### Provision input data
This repository uses GitHub Actions for CI. The workflows are defined in `.github/workflows/`.

Input sequences and metadata can be retrieved from data.nextstrain.org

* [sequences.fasta.xz](https://data.nextstrain.org/files/workflows/monkeypox/sequences.fasta.xz)
* [metadata.tsv.gz](https://data.nextstrain.org/files/workflows/monkeypox/metadata.tsv.gz)

Note that these data are generously shared by many labs around the world.
If you analyze and plan to publish using these data, please contact these labs first.

Within the analysis pipeline, these data are fetched from data.nextstrain.org and written to `data/` with:

```bash
nextstrain build . data/sequences.fasta data/metadata.tsv
```

### Run analysis pipeline

Run pipeline to produce the "overview" tree for `/mpox/all-clades` with:

```bash
nextstrain build . --configfile config/mpxv/config.yaml
```
## Development

Run pipeline to produce the "clade IIb" tree for `/mpox/clade-IIb` with:
### Pre-commit

```bash
nextstrain build . --configfile config/hmpxv1/config.yaml
```
This repository uses [pre-commit](https://pre-commit.com/) to run checks on the code before committing.

Run pipeline to produce the "lineage B.1" tree for `/mpox/lineage-B.1` with:
To install pre-commit on macOS, run:

```bash
nextstrain build . --configfile config/hmpxv1_big/config.yaml
brew install pre-commit
```

### Deploy

⚠️ The below is outdated and needs to be adjusted for the new build names (mpxv instead of monkeypox, etc.)

<details>

Run the python script [`scripts/deploy.py`](scripts/deploy.py) to deploy the staging build to production.

This will also automatically create a dated build where each node has a unique (random) ID so it can be targeted in shared links/narratives.
To install pre-commit on Ubuntu, run:

```bash
python scripts/deploy.py --build-names hmpxv1 mpxv
sudo apt install pre-commit
```

If a dated build already exists it is not overwritten by default. To overwrite, pass `-f`.

To deploy a locally built build to staging, use the `--staging` flag.

To not deploy a dated build to production, add the `--no-dated` flag.

</details>

### Visualize results

View results with:
To activate pre-commit, run:

```bash
nextstrain view .
```

## Configuration

Configuration takes place in `config/*/config.yaml` files for each build.
The analysis pipeline is contained in `workflow/snakemake_rule/core.smk`.
This can be read top-to-bottom, each rule specifies its file inputs and output and pulls its parameters from `config`.
There is little redirection and each rule should be able to be reasoned with on its own.

## Update example data

[Example data](./example_data/) is used by [CI](https://github.com/nextstrain/monkeypox/actions/workflows/ci.yaml). It can also be used as a small subset of real-world data.

Example data should be updated every time metadata schema is changed or a new clade/lineage emerges. To update, run:

```sh
nextstrain build . update_example_data -F
pre-commit install
```

## Data use

We gratefully acknowledge the authors, originating and submitting laboratories of the genetic
sequences and metadata for sharing their work. Please note that although data generators have
generously shared data in an open fashion, that does not mean there should be free license to
publish on this data. Data generators should be cited where possible and collaborations should be
sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if
uncertain.

## Development

[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/nextstrain/monkeypox/master.svg)](https://results.pre-commit.ci/latest/github/nextstrain/monkeypox/master)
Expand Down
102 changes: 102 additions & 0 deletions phylogenetic/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# nextstrain.org/monkeypox

This is the [Nextstrain](https://nextstrain.org) build for MPXV (mpox virus). Output from this build is visible at [nextstrain.org/monkeypox](https://nextstrain.org/monkeypox).
The lineages within the recent mpox outbreaks in humans are defined in a separate [lineage-designation repository](https://github.com/mpxv-lineages/lineage-designation).

## Software requirements

Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools.

## Usage

### Provision input data

Input sequences and metadata can be retrieved from data.nextstrain.org

* [sequences.fasta.xz](https://data.nextstrain.org/files/workflows/monkeypox/sequences.fasta.xz)
* [metadata.tsv.gz](https://data.nextstrain.org/files/workflows/monkeypox/metadata.tsv.gz)

Note that these data are generously shared by many labs around the world.
If you analyze and plan to publish using these data, please contact these labs first.

Within the analysis pipeline, these data are fetched from data.nextstrain.org and written to `data/` with:

```bash
nextstrain build . data/sequences.fasta data/metadata.tsv
```

### Run analysis pipeline

Run pipeline to produce the "overview" tree for `/mpox/all-clades` with:

```bash
nextstrain build . --configfile config/mpxv/config.yaml
```

Run pipeline to produce the "clade IIb" tree for `/mpox/clade-IIb` with:

```bash
nextstrain build . --configfile config/hmpxv1/config.yaml
```

Run pipeline to produce the "lineage B.1" tree for `/mpox/lineage-B.1` with:

```bash
nextstrain build . --configfile config/hmpxv1_big/config.yaml
```

### Deploy

⚠️ The below is outdated and needs to be adjusted for the new build names (mpxv instead of monkeypox, etc.)

<details>

Run the python script [`scripts/deploy.py`](scripts/deploy.py) to deploy the staging build to production.

This will also automatically create a dated build where each node has a unique (random) ID so it can be targeted in shared links/narratives.

```bash
python scripts/deploy.py --build-names hmpxv1 mpxv
```

If a dated build already exists it is not overwritten by default. To overwrite, pass `-f`.

To deploy a locally built build to staging, use the `--staging` flag.

To not deploy a dated build to production, add the `--no-dated` flag.

</details>

### Visualize results

View results with:

```bash
nextstrain view .
```

## Configuration

Configuration takes place in `config/*/config.yaml` files for each build.
The analysis pipeline is contained in `workflow/snakemake_rule/core.smk`.
This can be read top-to-bottom, each rule specifies its file inputs and output and pulls its parameters from `config`.
There is little redirection and each rule should be able to be reasoned with on its own.

## Update example data

[Example data](./example_data/) is used by [CI](https://github.com/nextstrain/monkeypox/actions/workflows/ci.yaml). It can also be used as a small subset of real-world data.

Example data should be updated every time metadata schema is changed or a new clade/lineage emerges. To update, run:

```sh
nextstrain build . update_example_data -F
```

## Data use

We gratefully acknowledge the authors, originating and submitting laboratories of the genetic
sequences and metadata for sharing their work. Please note that although data generators have
generously shared data in an open fashion, that does not mean there should be free license to
publish on this data. Data generators should be cited where possible and collaborations should be
sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if
uncertain.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit 03f9a25

Please sign in to comment.