Use a config file to define hardcoded parameters #9

kimandrews · 2024-01-12T00:44:45Z

Description of proposed changes

Use a config file to define hardcoded parameters and file paths

Checklist

Checks pass

j23414

Automated checks passed and local check passed:

git clone https://github.com/nextstrain/measles.git
cd measles
git checkout move-params-to-config
nextstrain build .
nextstrain view auspice

Changes look good to me!

huddlej

This looks great, @kimandrews!

Following from conversation on Slack in support of this idea, can you also move the hardcoded file paths for configuration files to the config YAML file, too? I would follow the pattern we use for the ncov workflow config files where we define a top-level "files" key in the YAML such that most of the entries in the files rule map like this:

files:
    exclude: "config/dropped_strains.txt"
    reference: "config/measles_reference.gb"
    colors: "config/colors.tsv"
    auspice_config: "config/auspice_config.json"

Since input_fasta doesn't get used anywhere and we instead download the data we need to start the workflow, we could just delete that line.

After discussion with the group, we decided to additionaly move hardcoded file paths for configuration files to the config YAML file

huddlej

Looks good to me! Thanks, @kimandrews!

joverlee521 · 2024-01-17T00:52:54Z

config/config.yaml

+files:
+    exclude: "config/dropped_strains.txt"
+    reference: "config/measles_reference.gb"
+    colors: "config/colors.tsv"
+    auspice_config: "config/auspice_config.json"


non-blocking

I would follow the pattern we use for the ncov workflow config files where we define a top-level "files" key in the YAML such that most of the entries in the files rule map like this:

@huddlej, can you share some details on the reason ncov was originally set up like this?
Specifically, why group under the top level files key instead of adding the files to specific param groups like so?

filter: exclude: "config/dropped_strains.txt" ... align: reference: "config/measles_reference.gb" export: colors: "config/colors.tsv" auspice_config: "config/auspice_config.json"

@joverlee521 That's a good question! The ncov pattern came from the same impulse we felt in this PR to replace a files rule in the Snakefile with a YAML config. I think I wanted to emphasize the placement of the configuration details instead of suggesting their reorganization. (My memories from this time period are hazy though!)

The original reason for grouping all the config file paths into one section of the Snakefile must date back farther, but I suspect the main reason for that structure is that the same files can be needed by multiple rules. For example, we might need the reference file for align, ancestral, and translate. In the flu workflow, we need to reference the list of strains from include in both augur filter rules and a later flag outliers rule. Even files that seem less ambiguously connected to a specific rule like the Auspice config JSON is to export, we also use the color scales from those JSONs in seasonal flu across multiple rules.

I see. Thanks for the context @huddlej!

Grouping files together makes sense if expected to be used multiple times. I also see the other side of it being confusing to need to edit configs for the filter rule under both files and filter. It's also not be immediately obvious looking at the config what rules use which files (unless you have detailed docs like ncov).

Currently, I see configs using a mix of top level files key, top level file name keys, and rule specific file name keys. We should come to some consensus on this schema as a group...I'll write up an issue on this in pathogen-repo-template.

It's also not be immediately obvious looking at the config what rules use which files (unless you have detailed docs like ncov).

I have spent plenty of time following the strings to work out which rules use which files (mainly in ncov), and often finding a bunch of files no longer used by any rule. Docs are great, but from experience they inevitably fall out of sync. How about something like the following (or a variant thereof)?

files: exclude: &files_exclude 'config/exclude_accessions.txt' filter: exclude: *files_exclude

P.S. Thanks @joverlee521 for the links to examples in your comment, it makes a big difference ✨

We can continue discussion in nextstrain/pathogen-repo-guide#26

Use a config file to define hardcoded parameters

4b207bf

kimandrews requested a review from a team January 12, 2024 00:47

j23414 approved these changes Jan 12, 2024

View reviewed changes

jameshadfield approved these changes Jan 12, 2024

View reviewed changes

joverlee521 approved these changes Jan 12, 2024

View reviewed changes

huddlej requested changes Jan 12, 2024

View reviewed changes

kimandrews added 2 commits January 16, 2024 14:35

Added hardcoded file paths to the config YAML file

bc9a476

After discussion with the group, we decided to additionaly move hardcoded file paths for configuration files to the config YAML file

Create and update changelog

c52d55b

kimandrews requested a review from huddlej January 16, 2024 22:52

huddlej approved these changes Jan 16, 2024

View reviewed changes

joverlee521 mentioned this pull request Jan 16, 2024

Persephone nextstrain/zika#28

Merged

1 task

joverlee521 reviewed Jan 17, 2024

View reviewed changes

kimandrews merged commit eb76f13 into main Jan 17, 2024
6 checks passed

kimandrews deleted the move-params-to-config branch January 17, 2024 00:59

joverlee521 mentioned this pull request Jan 17, 2024

How should we standardize config schema for files? nextstrain/pathogen-repo-guide#26

Open

joverlee521 mentioned this pull request Feb 20, 2024

phylo: Move hardcoded params to config YAML nextstrain/zika#35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a config file to define hardcoded parameters #9

Use a config file to define hardcoded parameters #9

kimandrews commented Jan 12, 2024 •

edited by huddlej

Loading

j23414 left a comment

huddlej left a comment

huddlej left a comment

joverlee521 Jan 17, 2024

huddlej Jan 17, 2024

joverlee521 Jan 17, 2024

jameshadfield Jan 17, 2024 •

edited

Loading

joverlee521 Jan 17, 2024

Use a config file to define hardcoded parameters #9

Use a config file to define hardcoded parameters #9

Conversation

kimandrews commented Jan 12, 2024 • edited by huddlej Loading

Description of proposed changes

Checklist

j23414 left a comment

Choose a reason for hiding this comment

huddlej left a comment

Choose a reason for hiding this comment

huddlej left a comment

Choose a reason for hiding this comment

joverlee521 Jan 17, 2024

Choose a reason for hiding this comment

huddlej Jan 17, 2024

Choose a reason for hiding this comment

joverlee521 Jan 17, 2024

Choose a reason for hiding this comment

jameshadfield Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

joverlee521 Jan 17, 2024

Choose a reason for hiding this comment

kimandrews commented Jan 12, 2024 •

edited by huddlej

Loading

jameshadfield Jan 17, 2024 •

edited

Loading