Skip to content

Commit

Permalink
Merge pull request #57: Configure pre-commit checks
Browse files Browse the repository at this point in the history
  • Loading branch information
victorlin authored Nov 1, 2024
2 parents deb1c34 + 3d5e0c3 commit 1c4d669
Show file tree
Hide file tree
Showing 11 changed files with 96 additions and 21 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/ingest-to-phylogenetic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ jobs:
ingest/benchmarks/
ingest/logs/
ingest/.snakemake/log/
# Check if ingest results include new data by checking for the cache
# of the file with the results' Metadata.sh256sum (which should have been added within upload-to-s3)
# GitHub will remove any cache entries that have not been accessed in over 7 days,
Expand All @@ -89,8 +89,8 @@ jobs:
# Code below is modified from ingest/upload-to-s3
# https://github.com/nextstrain/ingest/blob/c0b4c6bb5e6ccbba86374d2c09b42077768aac23/upload-to-s3#L23-L29
no_hash=0000000000000000000000000000000000000000000000000000000000000000
for s3_url in "${s3_urls[@]}"; do
Expand All @@ -109,7 +109,7 @@ jobs:
path: ingest-output-sha256sum
key: ingest-output-sha256sum-${{ hashFiles('ingest-output-sha256sum') }}
lookup-only: true

phylogenetic:
needs: [check-new-data]
if: ${{ needs.check-new-data.outputs.cache-hit != 'true' }}
Expand Down
14 changes: 14 additions & 0 deletions .github/workflows/pre-commit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: pre-commit

on:
- push

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: pre-commit/[email protected]
41 changes: 41 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
default_language_version:
python: python3
exclude: '\.(tsv|fasta|gb)$|^ingest/vendored/'
repos:
- repo: https://github.com/pre-commit/sync-pre-commit-deps
rev: v0.0.1
hooks:
- id: sync-pre-commit-deps
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.10.0.1
hooks:
- id: shellcheck
- repo: https://github.com/rhysd/actionlint
rev: v1.6.27
hooks:
- id: actionlint
entry: env SHELLCHECK_OPTS='--exclude=SC2027' actionlint
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-json
- id: check-executables-have-shebangs
- id: check-merge-conflict
- id: check-shebang-scripts-are-executable
- id: check-symlinks
- id: check-toml
- id: check-yaml
- id: destroyed-symlinks
- id: detect-private-key
- id: end-of-file-fixer
- id: fix-byte-order-marker
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.4.6
hooks:
# Run the linter.
- id: ruff
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,24 @@ nextstrain view .
## Documentation

- [Running a pathogen workflow](https://docs.nextstrain.org/en/latest/tutorials/running-a-workflow.html)

## Working on this repo

This repo is configured to use [pre-commit](https://pre-commit.com),
to help automatically catch common coding errors and syntax issues
with changes before they are committed to the repo.
.
If you will be writing new code or otherwise working within this repo,
please do the following to get started:

1. install `pre-commit` by running either `python -m pip install
pre-commit` or `brew install pre-commit`, depending on your
preferred package management solution
2. install the local git hooks by running `pre-commit install` from
the root of the repo
3. when problems are detected, correct them in your local working tree
before committing them.

Note that these pre-commit checks are also run in a GitHub Action when
changes are pushed to GitHub, so correcting issues locally will
prevent extra cycles of correction.
16 changes: 8 additions & 8 deletions ingest/bin/parse-measles-genotype-names.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,22 +24,22 @@ def parse_args():

def _set_genotype_name(record):
genotype_name = record["genotype_ncbi"]

genotype_name = genotype_name.replace('Measles virus genotype ', '')
genotype_name = re.sub(r'Measles morbillivirus.*$', r'', genotype_name)
genotype_name = re.sub(r'Measles morbillivirus.*$', r'', genotype_name)
genotype_name = re.sub(r'.*?\[(.*)\]$', r'\1', genotype_name) # If square brackets present at end of string, keep only the text inside the brackets
genotype_name = re.sub(r'Measles virus MVs.*$', r'', genotype_name)
genotype_name = re.sub(r'Measles virus MVi.*$', r'', genotype_name)
genotype_name = re.sub(r'Measles virus strain MVi.*$', r'', genotype_name)
genotype_name = genotype_name.replace('Measles virus strain ', '')
genotype_name = re.sub(r'Measles virus.*$', r'', genotype_name)
genotype_name = re.sub(r'A-vaccine.*$', r'A', genotype_name)
genotype_name = re.sub(r'B3.1', r'B3', genotype_name)
genotype_name = re.sub(r'B3.2', r'B3', genotype_name)
genotype_name = re.sub(r'D4a', r'D4', genotype_name)
genotype_name = re.sub(r'D4b', r'D4', genotype_name)
genotype_name = re.sub(r'H1a', r'H1', genotype_name)
genotype_name = re.sub(r'H1b', r'H1', genotype_name)
genotype_name = re.sub(r'B3.1', r'B3', genotype_name)
genotype_name = re.sub(r'B3.2', r'B3', genotype_name)
genotype_name = re.sub(r'D4a', r'D4', genotype_name)
genotype_name = re.sub(r'D4b', r'D4', genotype_name)
genotype_name = re.sub(r'H1a', r'H1', genotype_name)
genotype_name = re.sub(r'H1b', r'H1', genotype_name)

return (
genotype_name)
Expand Down
6 changes: 3 additions & 3 deletions nextclade/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@

# Measles Nextclade Dataset Tree

This workflow creates a phylogenetic tree that can be used as part of a Nextclade dataset to assign genotypes to measles samples based on [criteria outlined by the WHO](https://www.who.int/publications/i/item/WER8709).
This workflow creates a phylogenetic tree that can be used as part of a Nextclade dataset to assign genotypes to measles samples based on [criteria outlined by the WHO](https://www.who.int/publications/i/item/WER8709).

The WHO has defined 24 measles genotypes based on N gene and H gene sequences from 28 reference strains. For new measles samples, genotypes can be assigned based on genetic similarity to the reference strains at the "N450" region (a 450 bp region of the N gene).
The WHO has defined 24 measles genotypes based on N gene and H gene sequences from 28 reference strains. For new measles samples, genotypes can be assigned based on genetic similarity to the reference strains at the "N450" region (a 450 bp region of the N gene).

The tree created here includes N450 sequences for the 28 reference strains, along with other representative strains for each genotype.
The tree created here includes N450 sequences for the 28 reference strains, along with other representative strains for each genotype.

The workflow includes the following steps:
* Build a tree using samples from the `ingest` output, with the following sampling criteria:
Expand Down
2 changes: 1 addition & 1 deletion nextclade/Snakefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
configfile: "defaults/config.yaml"
configfile: "defaults/config.yaml"

rule all:
input:
Expand Down
2 changes: 1 addition & 1 deletion nextclade/rules/annotate_phylogeny.smk
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ rule timeout:
run:
import json
with open(input[0], 'r') as fh:
data = json.load(fh)
data = json.load(fh)
new_nodes = {}
for name, attrs in data['nodes'].items():
new_nodes[name] = {'mutation_length': attrs.get('mutation_length')}
Expand Down
2 changes: 1 addition & 1 deletion phylogenetic/defaults/auspice_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
"build_url": "https://github.com/nextstrain/measles",
"build_url": "https://github.com/nextstrain/measles",
"colorings": [
{
"key": "gt",
Expand Down
2 changes: 1 addition & 1 deletion phylogenetic/defaults/auspice_config_N450.json
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
"map",
"entropy",
"frequencies"
],
],
"metadata_columns": [
"author"
]
Expand Down
3 changes: 1 addition & 2 deletions phylogenetic/rules/prepare_sequences.smk
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ rule filter:
--group-by {params.group_by} \
--sequences-per-group {params.sequences_per_group} \
--min-date {params.min_date} \
--min-length {params.min_length}
--min-length {params.min_length}
"""

rule align:
Expand All @@ -86,4 +86,3 @@ rule align:
--fill-gaps \
--remove-reference
"""

0 comments on commit 1c4d669

Please sign in to comment.