Skip to content

Commit

Permalink
ingest: Output final files to results directory
Browse files Browse the repository at this point in the history
Instead of mixing the final results with the intermediate files produced
during the workflow run, output the final files to the result directory.
  • Loading branch information
joverlee521 committed Nov 3, 2023
1 parent d73f60a commit 1cccf1a
Show file tree
Hide file tree
Showing 7 changed files with 12 additions and 12 deletions.
4 changes: 2 additions & 2 deletions ingest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ nextstrain build .

This will produce two files (within the `ingest` directory):

- `data/metadata.tsv`
- `data/sequences.fasta`
- `results/metadata.tsv`
- `results/sequences.fasta`

Run the complete ingest pipeline and upload results to AWS S3 with

Expand Down
2 changes: 1 addition & 1 deletion ingest/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ send_slack_notifications = config.get("send_slack_notifications", False)

def _get_all_targets(wildcards):
# Default targets are the metadata TSV and sequences FASTA files
all_targets = ["data/sequences.fasta", "data/metadata.tsv"]
all_targets = ["results/sequences.fasta", "results/metadata.tsv"]

# Add additional targets based on upload config
upload_config = config.get("upload", {})
Expand Down
4 changes: 2 additions & 2 deletions ingest/config/optional.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ upload:
files_to_upload:
genbank.ndjson.xz: data/genbank.ndjson
all_sequences.ndjson.xz: data/sequences.ndjson
metadata.tsv.gz: data/metadata.tsv
sequences.fasta.xz: data/sequences.fasta
metadata.tsv.gz: results/metadata.tsv
sequences.fasta.xz: results/sequences.fasta
alignment.fasta.xz: data/alignment.fasta
insertions.csv.gz: data/insertions.csv
translations.zip: data/translations.zip
Expand Down
6 changes: 3 additions & 3 deletions ingest/workflow/snakemake_rules/nextclade.smk
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ rule nextclade_dataset_hMPXV:

rule align:
input:
sequences="data/sequences.fasta",
sequences="results/sequences.fasta",
dataset="hmpxv.zip",
output:
alignment="data/alignment.fasta",
Expand All @@ -41,7 +41,7 @@ rule align:

rule nextclade:
input:
sequences="data/sequences.fasta",
sequences="results/sequences.fasta",
dataset="mpxv.zip",
output:
"data/nextclade.tsv",
Expand All @@ -58,7 +58,7 @@ rule join_metadata_clades:
metadata="data/metadata_raw.tsv",
nextclade_field_map=config["nextclade"]["field_map"],
output:
metadata="data/metadata.tsv",
metadata="results/metadata.tsv",
params:
id_field=config["transform"]["id_field"],
nextclade_id_field=config["nextclade"]["id_field"],
Expand Down
2 changes: 1 addition & 1 deletion ingest/workflow/snakemake_rules/slack_notifications.smk
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ rule notify_on_genbank_record_change:

rule notify_on_metadata_diff:
input:
metadata="data/metadata.tsv",
metadata="results/metadata.tsv",
output:
touch("data/notify/metadata-diff.done"),
params:
Expand Down
4 changes: 2 additions & 2 deletions ingest/workflow/snakemake_rules/transform.smk
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ formats and expects input file
This will produce output files as
metadata = "data/metadata_raw.tsv"
sequences = "data/sequences.fasta"
sequences = "results/sequences.fasta"
Parameters are expected to be defined in `config.transform`.
"""
Expand Down Expand Up @@ -43,7 +43,7 @@ rule transform:
annotations=config["transform"]["annotations"],
output:
metadata="data/metadata_raw.tsv",
sequences="data/sequences.fasta",
sequences="results/sequences.fasta",
log:
"logs/transform.txt",
params:
Expand Down
2 changes: 1 addition & 1 deletion ingest/workflow/snakemake_rules/upload.smk
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def _get_upload_inputs(wildcards):

if file_to_upload == "data/genbank.ndjson":
flag_file = "data/notify/genbank-record-change.done"
elif file_to_upload == "data/metadata.tsv":
elif file_to_upload == "results/metadata.tsv":
flag_file = "data/notify/metadata-diff.done"

inputs["notify_flag_file"] = flag_file
Expand Down

0 comments on commit 1cccf1a

Please sign in to comment.