Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest updates #222

Merged
merged 6 commits into from
Nov 6, 2023
Merged

Ingest updates #222

merged 6 commits into from
Nov 6, 2023

Conversation

joverlee521
Copy link
Contributor

Description of proposed changes

Minor refactoring and updates to the ingest workflow that came from review of nextstrain/dengue#13, mainly:

  • change upload config to a files_to_upload mapping
  • move final output files to the results/ directory

Checklist

  • Checks pass

Simplify the `files_to_upload` config as a single mapping where the key
is the remote file name and the value is the local file instead of
maintaining a two lists of files. This ensures that we know exactly
which local file is uploaded to the remote file without worrying about
order or duplicates.
Change the expectation that the local file paths for `file_to_upload`
must be relative to the ingest directory instead of the ingest/data
directory. This is done in preparation for moving the final outputs
fo the ingest workflow to an ingest/results directory.
Instead of mixing the final results with the intermediate files produced
during the workflow run, output the final files to the result directory.
@joverlee521 joverlee521 requested a review from a team November 3, 2023 00:51
@joverlee521 joverlee521 requested a review from j23414 November 3, 2023 00:51
ingest/Snakefile Outdated Show resolved Hide resolved
Copy link
Contributor

@j23414 j23414 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I particularly like the switch to the key:value style of listing remote and local files, thanks for the changes!

As far as I can tell, the test ran to completion, and I checked s3:

nextstrain remote ls s3://nextstrain-data/files/workflows/mpox/branch/ingest-updates

files/workflows/mpox/branch/ingest-updates/alignment.fasta.xz
files/workflows/mpox/branch/ingest-updates/all_sequences.ndjson.xz
files/workflows/mpox/branch/ingest-updates/genbank.ndjson.xz
files/workflows/mpox/branch/ingest-updates/insertions.csv.gz
files/workflows/mpox/branch/ingest-updates/metadata.tsv.gz
files/workflows/mpox/branch/ingest-updates/sequences.fasta.xz
files/workflows/mpox/branch/ingest-updates/translations.zip

I downloaded the metadata and sequences files and they weren’t empty and look reasonable.

@joverlee521 joverlee521 merged commit c9b8282 into master Nov 6, 2023
1 check passed
@joverlee521 joverlee521 deleted the ingest-updates branch November 6, 2023 23:44
@joverlee521 joverlee521 mentioned this pull request Nov 7, 2023
2 tasks
joverlee521 added a commit that referenced this pull request Nov 15, 2023
Fix bug that was introduced in #222

The testing done on branch runs do not include the Slack notifications,
so this bug was not revealed in the test run. I only noticed the
workflow as failing because our internal #monkeypox-updates channel had
been quiet for over a week.

This did not trigger our usual error notifications in Slack because
the error occurs during the DAG building process before the start
of the actual workflow run.
@joverlee521 joverlee521 mentioned this pull request Nov 15, 2023
2 tasks
joverlee521 added a commit that referenced this pull request Nov 15, 2023
This change was motivated by the unintentional bug introduced in
#222 that would only be triggered
by using Slack notifications. This allows to test branches and send
notifications to the testing channel.

As part of this change, I've added an organization level variable
`TEST_SLACK_CHANNEL` that points our #scratch channel for testing
Slack notifications.
joverlee521 added a commit that referenced this pull request Nov 15, 2023
This change was motivated by the unintentional bug introduced in
#222 that would only be triggered
by using Slack notifications. This allows to test branches and send
notifications to the testing channel.

As part of this change, I've added an organization level variable
`TEST_SLACK_CHANNEL` that points our #scratch channel for testing
Slack notifications.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants