-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardized composable configs #79
Comments
I talked about this with @joverlee521 during our 1:1 yesterday, and I've mentioned it elsewhere too, but I think there's a lot to be said for approaching our configs as "small multiples": One build (i.e. one set of Auspice JSONs) == one "small" config document and the config for multi-build workflows == a collection (dict/list) of these small config documents. builds:
zika:
filter:
group_by: …
min_date: …
min_length: … builds:
avian-flu/h5n1/ha/all-time:
filter:
group_by: …
min_date: …
min_length: …
avian-flu/h5n1/ha/2y:
filter:
group_by: …
min_date: …
min_length: …
avian-flu/h5n1/mp/all-time:
filter:
group_by: …
min_date: …
min_length: …
…
# or, maybe alternatively, nested: <https://github.com/joverlee521/nextstrain-testing/blob/cba0c7e5/configs/configs/avian-flu.yaml> Benefits:
Taking this idea further, I see two main places of interaction with config:
We've not treated those separately, i.e. the config is ~identical between the two, but I think we should start treating them separately:
A concise and expressive syntax such as globbing seems easier to explain/teach with the "small multiples" approach: the key concept is that the concise syntax is expanded to the collection of small configs, and this expansion can be previewed in advance of actually running the workflow. |
We discussed CUE a bit back in Jan 2022 and I ended up testing CUE for seasonal flu's config, but the consensus at the time was summarized by @rneher's comment of "I don't think we should get to hung up on how to generate configs." |
@huddlej Thanks for digging up that previous discussion and example! I'd forgotten about that (and it's interesting to look at other examples in that Slack thread). I was advocating for a "small multiples" approach then too:
In response to:
I see the "small multiples" approach as intentionally not getting hung up on how configs are generated by making it possible to generate/produce them many ways. The alternative of a single complex config with bespoke composition methods more easily leads IMO to getting hung up on exactly what you can and can't compose and how. All this said though, I (still) disagree that "sweating the details" in the context of improving usability is "getting hung up on" them. |
This is a meta issue for tracking work around "composable configs"
Context
We don't have a centralized config schema for phylogenetic workflows because each pathogen runs different Augur commands and custom scripts that use different params. The config gets even more complicated when the workflow creates multiple builds (e.g. flu subtype x segment x time resolution). Since the workflows are authored by different people, we end up with varying config schemas that can be confusing to outside users. With the config file as the main UI for external users of workflows, we should make them easier to work with!
Documenting available config params and their default values
I don't think it's realistic to write and maintain detailed workflow config docs like we have for ncov, so I've been trying to find a way to have centralized documentation for config files.
Making it easy to overriding configs
With
nextstrain build
and the forthcoming workflows as programs, users can provide custom config files to override the default config params. This is relatively straightforward for single build workflows (as long as the config params are well documented). This can be tedious for multi-build workflows as discussed on Slack. The path forward here is less clear, but here are some related work around this:augur export v2
augur#298 (although this issue is focused on the auspice config, there are similar ideas)Validating configs during the workflow
It'd be useful to get immediate feedback during the workflow run by validating the user's custom config file. This should flag missing required config params and config params in the config file that are not being used in the workflow. This would require maintenance of a config schema that matches the use of configs in the workflow.
The text was updated successfully, but these errors were encountered: