Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update key_colnames, revision_summary #540

Draft
wants to merge 13 commits into
base: dev
Choose a base branch
from

Conversation

brookslogan
Copy link
Contributor

@brookslogan brookslogan commented Oct 10, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

  • Make key_colnames.epi_archive output its unique key colnames (including version).
  • Make key_colnames.data.frame require other_keys to be passed.
  • Remove key_colnames.default.
  • Require exclude = to be passed by name in key_colnames.
  • Require dots to be empty in key_colnames.
    • Note: this triggers some errors in epipredict tests. See below.
  • Update revision_summary to use new key_colnames.epi_archive.
  • Fix&tweak some tidyeval stuff in revision_summary.
  • Tweak some naming in revision_summary.

epipredict errors

Originally, I misread and thought these were from passing other_keys = hopefully-redundantly into key_colnames.epi_df and having them be ignored. I was planning to do something like

  expected_other_keys <- attr(x, "metadata")$other_keys
  if (is.null(other_keys)) {
    other_keys <- expected_other_keys
  } else {
    if (!identical(other_keys, expected_other_keys)) {
      cli_abort(c(
        "`other_keys` was provided, but didn't match expectations from inspecting `x`",
        "*" = "`other_keys` was {format_chr_with_quotes(other_keys)}",
        "*" = "`expected_other_keys` was {format_chr_with_quotes(expected_other_keys)}",
        "i" = "If you resolve this discrepancy by adjusting the metadata of `x`, you
               shouldn't have to pass `other_keys =` here anymore unless you want to
               continue to perform this check."
      ))
    }
  }

However, the error is actually from passing extra_keys = (which was also previously ignored, but it sounds like it has a different meaning). I am looking a bit more into this.

  • todo: decide what to do re. extra_keys =
    • idea for now: forbid + PR to epipredict to specify other_keys = instead. If current behavior of other_keys = with epi_dfs breaks later, then decide whether to change things here or adjust any offending steps/layers.
  • todo: check whether epipredict#410 is caught... requiring other_keys = in key_colnames.data.frame should be flagging this, right?
    • It is. Also need to include fix in epipredict PR before merging this.
  • todo: prepare epipredict PR and get it merged

Other work

  • todo: tests for additional revision_summary() adjustments
  • todo: break into 2 dependent PRs

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

* Make `key_colnames.epi_archive` output epikey-time-version rather than just
  epikey-time.
* Make `key_colnames.data.frame` require `other_keys` be provided.
* Remove `key_colnames.default`.
* Make `key_colnames` forbid passing `exclude` positionally.
* Update downstream `revision_summary`.
* Produce error rather than default selection when user provides a tidyselection
  in ... but it selects zero columns.
* Change time_within_x_latest to take `values` as a vector
* Use `.data` instead of `pick` etc. in some places
So it is not misinterpreted as "the amount of time that it has been near the latest".
* This may fix some behaviors and emit more sensible error messages on
  yearmonths given yearmonth-incompatible settings.
* This should express time differences for weekly data in terms of weeks,
  and may emit errors given weekly-"incompatible" settings.
* This appears to be computationally faster (vs. `as.integer(version) -
  as.integer(time_value)`).
@brookslogan brookslogan force-pushed the lcb/update-key_colnames.epi_archive branch from 1353df9 to f60a221 Compare December 16, 2024 19:25
@brookslogan brookslogan force-pushed the lcb/update-key_colnames.epi_archive branch 2 times, most recently from ea6cfc0 to 93b2cca Compare December 18, 2024 20:11
@brookslogan brookslogan force-pushed the lcb/update-key_colnames.epi_archive branch from 93b2cca to 97495b9 Compare December 18, 2024 23:44
It's probably best to immediately ungroup after performing grouped operations in
our documentation, as leaving things grouped accidentally is a source of errors.
Sometime we should consider an overhaul to use `by =` and `.by =` where
appropriate (sorting effects not needed) and available (not all operations
support this syntax yet).

There were already 0s in the example data, so "highlight" with words the effects
of completion + note one potential surprise in other applications.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant