Skip to content

Commit

Permalink
Parallelize ci (#29)
Browse files Browse the repository at this point in the history
* fixed assembly hashsum

* added a 'three tries' for nucleotide download

* parallelize dataset CI better

* chunks of 25

* fix addition oopsie

* add quotes for int

* fix 50 to 25 per chunk

* debug yaml int

* are dashes the new underscore?

* stash num per chunk in 'include'

* m

* back up to put num-per-chunk into matrix strategy

* back to underscore

* debugging this oddity

* commented more lines

* I found the syntax error for mathing

* exit with pass when zero samples in chunk

* what is github run number

* removed github run number

* can I bump up the parallel jobs to 20?

* what about to 50?
  • Loading branch information
lskatz authored Jun 10, 2022
1 parent 3191269 commit ee93027
Showing 1 changed file with 31 additions and 17 deletions.
48 changes: 31 additions & 17 deletions .github/workflows/unit-testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ on: [push, create]
jobs:
build:
runs-on: ubuntu-18.04
name: ${{ matrix.DATASET }}
name: ${{ matrix.DATASET }} (chunk${{ matrix.CHUNK }}, chunk size ${{ matrix.NUM_PER_CHUNK }})
strategy:
fail-fast: false
max-parallel: 3
max-parallel: 50
matrix:
DATASET:
- datasets/sars-cov-2-voivoc.tsv
Expand All @@ -19,6 +19,10 @@ jobs:
- datasets/sars-cov-2-coronahit-routine.tsv
- datasets/sars-cov-2-SNF-A.tsv
- datasets/sars-cov-2-failedQC.tsv
NUM_PER_CHUNK:
- 25
# TODO is there a $SGE_TASK_ID equivalent instead of listing each chunk???
CHUNK: [25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400]
steps:
- name: Check out the repo
uses: actions/checkout@v2
Expand All @@ -42,23 +46,33 @@ jobs:
- name: unit testing - just env
run: |
bats t/00_env.bats
- name: abbreviated unit testing with ${{ matrix.DATASET }}
if: ${{ github.event_name != 'create' }}
- name: unit test chunk of ${{ matrix.DATASET }}
run: |
export NCBI_API_KEY=${{ secrets.NCBI_API_KEY }}
if [[ -z "$NCBI_API_KEY" ]]; then echo "NCBI_API_KEY not found in github secrets!"; fi;
# Get the header and just two samples for the abbreviated test
grep -B 999 -A 3 biosample_acc ${{ matrix.DATASET }} > ${{ matrix.DATASET }}.short
export DATASET=$(realpath ${{ matrix.DATASET }}).short
echo "Abbreviated dataset: $DATASET"
bats t/*
- name: full unit testing with ${{ matrix.DATASET }}
if: ${{ github.event_name == 'create' }}
run: |
export NCBI_API_KEY=${{ secrets.NCBI_API_KEY }}
if [[ -z "$NCBI_API_KEY" ]]; then echo "# NCBI_API_KEY not found in github secrets!"; fi;
echo "Full dataset: ${{ matrix.DATASET }}"
export DATASET=$(realpath ${{ matrix.DATASET }})
echo "DEBUG: allowing for error exit code in TAP"
export DATASET=$(pwd -P)/${{ matrix.DATASET }}.${{ matrix.CHUNK }}.short
CHUNK=${{ matrix.CHUNK }}
NUM_PER_CHUNK=${{ matrix.NUM_PER_CHUNK }}
# Get the header of the dataset
grep -B 999 biosample_acc ${{ matrix.DATASET }} > $DATASET
# Get the samples of the dataset (everything past the header)
# and then get the number of lines dictated by CHUNK (e.g., 50, 100, 150,...)
# with sed -n Xp
FIRST_LINE=$(($CHUNK - $NUM_PER_CHUNK + 1))
LAST_LINE=${{ matrix.CHUNK }}
grep -A 99999 biosample_acc ${{ matrix.DATASET }} | tail -n +2 | sed -n ${FIRST_LINE},${LAST_LINE}p >> $DATASET.body
cat $DATASET.body >> $DATASET
# If we have zero samples, just exit with pass
NUM_SAMPLES=$(wc -l < $DATASET.body)
if [[ $NUM_SAMPLES -lt 1 ]]; then
echo "Number of samples is zero; exiting with pass"
exit 0
fi
# Run the TAP compliant unit test which reads env variable $DATASET
echo "DATASET CHUNK $DATASET"
bats t/*

0 comments on commit ee93027

Please sign in to comment.