Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor nm_input #150

Merged
merged 6 commits into from
May 22, 2024
Merged

Refactor nm_input #150

merged 6 commits into from
May 22, 2024

Conversation

kylebaron
Copy link
Contributor

@kylebaron kylebaron commented May 11, 2024

Examples

library(yspec)

spek <- ys_help$spec() 
spec <- ys_select(spek, ID, NUM,  EVID, RF, ALT, BMI, PHASE, STUDY, AMT, DV)

Default, wide

nm_input(spek)
#> $INPUT
#> C NUM ID SUBJ=DROP TIME SEQ CMT EVID AMT DV AGE WT CRCL ALB BMI
#> AAG SCR AST ALT HT CP TAFD TAD LDOS MDV BLQ PHASE STUDY RF=DROP

Drop

nm_input(spec, .drop = "PHASE")
#> $INPUT
#> ID NUM EVID RF=DROP ALT BMI PHASE=DROP STUDY AMT DV

Rename

nm_input(spec, DOSE = AMT)
#> $INPUT
#> ID NUM EVID RF=DROP ALT BMI PHASE STUDY DOSE=AMT DV

Long

nm_input(spec, .long = TRUE)
#> $INPUT
#> ID      ; subject identifier
#> NUM     ; record number
#> EVID    ; event ID
#> RF=DROP ; renal function stage
#> ALT     ; alanine aminotransferase
#> BMI     ; BMI
#> PHASE   ; study phase indicator
#> STUDY   ; study number
#> AMT     ; dose amount
#> DV      ; dependent variable

Long, with decodes

nm_input(spec, .long = TRUE, .decodes = TRUE)
#> $INPUT
#> ID      ; subject identifier
#> NUM     ; record number
#> EVID    ; event ID 
#>         ; [0 = observation, 1 = dose]
#> RF=DROP ; renal function stage
#> ALT     ; alanine aminotransferase
#> BMI     ; BMI
#> PHASE   ; study phase indicator 
#>         ; [values: 1]
#> STUDY   ; study number 
#>         ; [1 = SAD, 2 = MAD, 3 = Renal, 4 = Hepatic]
#> AMT     ; dose amount
#> DV      ; dependent variable

Created on 2024-05-11 with reprex v2.0.2

Summary

This is a breaking change.

  • Users can still get the old, long, verbose output. But we'll default with the more standard, wide format going forward. I have never seen anyone using the long format.
  • This also re-names some arguments; I don't have any special handling for that at the moment; if this were in use, I think the most likely result would be an error in selecting columns called cat or width.

@kylebaron kylebaron changed the title Refactor nm_input [WIP] Refactor nm_input May 11, 2024
@kylebaron kylebaron changed the title [WIP] Refactor nm_input Refactor nm_input May 11, 2024
@kylebaron kylebaron requested a review from kyleam May 13, 2024 14:08
Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stepping through the code, this all looks good to me.

This is a breaking change. [...] if this were in use, I think the most likely result would be an error in selecting columns called cat or width.

Right, I'll of course defer to your judgment on that one. If that were a concern, one option would just be to expose this new interface as a new function, leaving nm_input (perhaps deprecated) with its same signature and behavior (with both the old and new sharing the same internal code).


$ git diff-tree -r origin/main origin/nm-input -- '*.RDS'
:100644 100644 79af6733e5e9762c373f84b35603a61bf18ccf11 9945d8b68a543460c409e60644cff212fecad22a M	inst/test_data/test1.RDS
:100644 100644 d0ac08b499b8c7a32cd80103119ee903b6af3ed4 1e42ba7594b99af72ae68be8632ad3e3acdee286 M	inst/test_data/test2.RDS

The underlying data set for these RDS files is staying the same (verified with script below). This is regenerated by make data being invoked (likely by make all or make check), and the file is changing due to metadata changes (e.g., $writer_version). Okay.

script
#!/bin/sh

set -eu

tdir=$(mktemp -d "${TMPDIR:-/tmp}"/yspec-XXXX)


git cat-file blob 79af6733e5e9762c373f84b35603a61bf18ccf11 >"$tdir/test1-pre.rds"
git cat-file blob 9945d8b68a543460c409e60644cff212fecad22a >"$tdir/test1-post.rds"

git cat-file blob d0ac08b499b8c7a32cd80103119ee903b6af3ed4 >"$tdir/test2-pre.rds"
git cat-file blob 1e42ba7594b99af72ae68be8632ad3e3acdee286 >"$tdir/test2-post.rds"

cd "$tdir"

rerds () {
    Rscript -e 'saveRDS(readRDS("'"$1"'"), file = "re-'"$1"'")'
}

rerds test1-pre.rds
rerds test1-post.rds
rerds test2-pre.rds
rerds test2-post.rds

cmp re-test1-pre.rds re-test1-post.rds
cmp re-test2-pre.rds re-test2-post.rds

Comment on lines +57 to +59
#' @details
#' Columns with character type are automatically dropped; there is no need
#' to list these under the `.drop` argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repeats what is said under .drop. I'm noting in case that was unintentional, but of course it's fine as is if the repetition is on purpose.

@kylebaron
Copy link
Contributor Author

Thanks @kyleam. I'm going to stick with the breaking change on this one just because I like the name and $INPUT almost always looks like the new default. With larger, more complicated data sets, the long output is unwieldy.

@kylebaron kylebaron merged commit 7d3b752 into main May 22, 2024
7 checks passed
@kylebaron kylebaron deleted the nm-input branch May 22, 2024 12:00
@kylebaron kylebaron mentioned this pull request May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants