Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a config file to define hardcoded parameters #9

Merged
merged 3 commits into from
Jan 17, 2024

Conversation

kimandrews
Copy link
Contributor

@kimandrews kimandrews commented Jan 12, 2024

Description of proposed changes

Use a config file to define hardcoded parameters and file paths

Checklist

  • Checks pass

@kimandrews kimandrews requested a review from a team January 12, 2024 00:47
Copy link
Contributor

@j23414 j23414 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated checks passed and local check passed:

git clone https://github.com/nextstrain/measles.git
cd measles
git checkout move-params-to-config
nextstrain build .
nextstrain view auspice

Changes look good to me!

Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @kimandrews!

Following from conversation on Slack in support of this idea, can you also move the hardcoded file paths for configuration files to the config YAML file, too? I would follow the pattern we use for the ncov workflow config files where we define a top-level "files" key in the YAML such that most of the entries in the files rule map like this:

files:
    exclude: "config/dropped_strains.txt"
    reference: "config/measles_reference.gb"
    colors: "config/colors.tsv"
    auspice_config: "config/auspice_config.json"

Since input_fasta doesn't get used anywhere and we instead download the data we need to start the workflow, we could just delete that line.

After discussion with the group, we decided to additionaly move
hardcoded file paths for configuration files to the config YAML file
@kimandrews kimandrews requested a review from huddlej January 16, 2024 22:52
Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks, @kimandrews!

@joverlee521 joverlee521 mentioned this pull request Jan 16, 2024
1 task
Comment on lines +1 to +5
files:
exclude: "config/dropped_strains.txt"
reference: "config/measles_reference.gb"
colors: "config/colors.tsv"
auspice_config: "config/auspice_config.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking

I would follow the pattern we use for the ncov workflow config files where we define a top-level "files" key in the YAML such that most of the entries in the files rule map like this:

@huddlej, can you share some details on the reason ncov was originally set up like this?
Specifically, why group under the top level files key instead of adding the files to specific param groups like so?

filter:
    exclude: "config/dropped_strains.txt"
    ...
align:
    reference: "config/measles_reference.gb"
export:
    colors: "config/colors.tsv"
    auspice_config: "config/auspice_config.json"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joverlee521 That's a good question! The ncov pattern came from the same impulse we felt in this PR to replace a files rule in the Snakefile with a YAML config. I think I wanted to emphasize the placement of the configuration details instead of suggesting their reorganization. (My memories from this time period are hazy though!)

The original reason for grouping all the config file paths into one section of the Snakefile must date back farther, but I suspect the main reason for that structure is that the same files can be needed by multiple rules. For example, we might need the reference file for align, ancestral, and translate. In the flu workflow, we need to reference the list of strains from include in both augur filter rules and a later flag outliers rule. Even files that seem less ambiguously connected to a specific rule like the Auspice config JSON is to export, we also use the color scales from those JSONs in seasonal flu across multiple rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the context @huddlej!

Grouping files together makes sense if expected to be used multiple times. I also see the other side of it being confusing to need to edit configs for the filter rule under both files and filter. It's also not be immediately obvious looking at the config what rules use which files (unless you have detailed docs like ncov).

Currently, I see configs using a mix of top level files key, top level file name keys, and rule specific file name keys. We should come to some consensus on this schema as a group...I'll write up an issue on this in pathogen-repo-template.

Copy link
Member

@jameshadfield jameshadfield Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also not be immediately obvious looking at the config what rules use which files (unless you have detailed docs like ncov).

I have spent plenty of time following the strings to work out which rules use which files (mainly in ncov), and often finding a bunch of files no longer used by any rule. Docs are great, but from experience they inevitably fall out of sync. How about something like the following (or a variant thereof)?

files:
  exclude: &files_exclude 'config/exclude_accessions.txt'
filter:
  exclude: *files_exclude

P.S. Thanks @joverlee521 for the links to examples in your comment, it makes a big difference ✨

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can continue discussion in nextstrain/pathogen-repo-guide#26

@kimandrews kimandrews merged commit eb76f13 into main Jan 17, 2024
6 checks passed
@kimandrews kimandrews deleted the move-params-to-config branch January 17, 2024 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants