Refactor nm_input #150

kylebaron · 2024-05-11T14:18:49Z

Examples

library(yspec)

spek <- ys_help$spec() 
spec <- ys_select(spek, ID, NUM,  EVID, RF, ALT, BMI, PHASE, STUDY, AMT, DV)

Default, wide

nm_input(spek)
#> $INPUT
#> C NUM ID SUBJ=DROP TIME SEQ CMT EVID AMT DV AGE WT CRCL ALB BMI
#> AAG SCR AST ALT HT CP TAFD TAD LDOS MDV BLQ PHASE STUDY RF=DROP

Drop

nm_input(spec, .drop = "PHASE")
#> $INPUT
#> ID NUM EVID RF=DROP ALT BMI PHASE=DROP STUDY AMT DV

Rename

nm_input(spec, DOSE = AMT)
#> $INPUT
#> ID NUM EVID RF=DROP ALT BMI PHASE STUDY DOSE=AMT DV

Long

nm_input(spec, .long = TRUE)
#> $INPUT
#> ID      ; subject identifier
#> NUM     ; record number
#> EVID    ; event ID
#> RF=DROP ; renal function stage
#> ALT     ; alanine aminotransferase
#> BMI     ; BMI
#> PHASE   ; study phase indicator
#> STUDY   ; study number
#> AMT     ; dose amount
#> DV      ; dependent variable

Long, with decodes

nm_input(spec, .long = TRUE, .decodes = TRUE)
#> $INPUT
#> ID      ; subject identifier
#> NUM     ; record number
#> EVID    ; event ID 
#>         ; [0 = observation, 1 = dose]
#> RF=DROP ; renal function stage
#> ALT     ; alanine aminotransferase
#> BMI     ; BMI
#> PHASE   ; study phase indicator 
#>         ; [values: 1]
#> STUDY   ; study number 
#>         ; [1 = SAD, 2 = MAD, 3 = Renal, 4 = Hepatic]
#> AMT     ; dose amount
#> DV      ; dependent variable

^{Created on 2024-05-11 with reprex v2.0.2}

Summary

This is a breaking change.

Users can still get the old, long, verbose output. But we'll default with the more standard, wide format going forward. I have never seen anyone using the long format.
This also re-names some arguments; I don't have any special handling for that at the moment; if this were in use, I think the most likely result would be an error in selecting columns called cat or width.

…for long format; don't include decodes by default

kyleam

Stepping through the code, this all looks good to me.

This is a breaking change. [...] if this were in use, I think the most likely result would be an error in selecting columns called cat or width.

Right, I'll of course defer to your judgment on that one. If that were a concern, one option would just be to expose this new interface as a new function, leaving nm_input (perhaps deprecated) with its same signature and behavior (with both the old and new sharing the same internal code).

$ git diff-tree -r origin/main origin/nm-input -- '*.RDS'
:100644 100644 79af6733e5e9762c373f84b35603a61bf18ccf11 9945d8b68a543460c409e60644cff212fecad22a M	inst/test_data/test1.RDS
:100644 100644 d0ac08b499b8c7a32cd80103119ee903b6af3ed4 1e42ba7594b99af72ae68be8632ad3e3acdee286 M	inst/test_data/test2.RDS

The underlying data set for these RDS files is staying the same (verified with script below). This is regenerated by make data being invoked (likely by make all or make check), and the file is changing due to metadata changes (e.g., $writer_version). Okay.

script

#!/bin/sh

set -eu

tdir=$(mktemp -d "${TMPDIR:-/tmp}"/yspec-XXXX)


git cat-file blob 79af6733e5e9762c373f84b35603a61bf18ccf11 >"$tdir/test1-pre.rds"
git cat-file blob 9945d8b68a543460c409e60644cff212fecad22a >"$tdir/test1-post.rds"

git cat-file blob d0ac08b499b8c7a32cd80103119ee903b6af3ed4 >"$tdir/test2-pre.rds"
git cat-file blob 1e42ba7594b99af72ae68be8632ad3e3acdee286 >"$tdir/test2-post.rds"

cd "$tdir"

rerds () {
    Rscript -e 'saveRDS(readRDS("'"$1"'"), file = "re-'"$1"'")'
}

rerds test1-pre.rds
rerds test1-post.rds
rerds test2-pre.rds
rerds test2-post.rds

cmp re-test1-pre.rds re-test1-post.rds
cmp re-test2-pre.rds re-test2-post.rds

kyleam · 2024-05-14T18:33:42Z

R/nm_input.R

+#' @details
+#' Columns with character type are automatically dropped; there is no need 
+#' to list these under the `.drop` argument.


This repeats what is said under .drop. I'm noting in case that was unintentional, but of course it's fine as is if the repetition is on purpose.

kylebaron · 2024-05-22T11:59:57Z

Thanks @kyleam. I'm going to stick with the breaking change on this one just because I like the name and $INPUT almost always looks like the new default. With larger, more complicated data sets, the long output is unwieldy.

kylebaron added 5 commits May 8, 2024 23:13

refactor nm-input

b9734dd

documentation

36ff902

refactor to allow renames in long format; optionally include decodes …

9403a48

…for long format; don't include decodes by default

start on tests

f7b0ecd

documentation; fix some logic

57fc636

kylebaron changed the title ~~Refactor nm_input~~ [WIP] Refactor nm_input May 11, 2024

tweak documentation

3f2374c

kylebaron changed the title ~~[WIP] Refactor nm_input~~ Refactor nm_input May 11, 2024

kylebaron requested a review from kyleam May 13, 2024 14:08

kyleam approved these changes May 14, 2024

View reviewed changes

kylebaron merged commit 7d3b752 into main May 22, 2024
7 checks passed

kylebaron deleted the nm-input branch May 22, 2024 12:00

kylebaron mentioned this pull request May 22, 2024

Release/0.6.2 #151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor nm_input #150

Refactor nm_input #150

kylebaron commented May 11, 2024 •

edited

Loading

kyleam left a comment

kyleam May 14, 2024

kylebaron commented May 22, 2024

Refactor nm_input #150

Refactor nm_input #150

Conversation

kylebaron commented May 11, 2024 • edited Loading

Examples

Default, wide

Drop

Rename

Long

Long, with decodes

Summary

kyleam left a comment

Choose a reason for hiding this comment

kyleam May 14, 2024

Choose a reason for hiding this comment

kylebaron commented May 22, 2024

kylebaron commented May 11, 2024 •

edited

Loading