simplify input data format #11

hillalex · 2024-10-02T15:10:44Z

rename stan_id to pid to make the interface clearer and less implementation specific
rename titre to value to be more in line with other seroanalytics packages and to distinguish more clearly from titre_type
remove all model-specific variables from the input data that can be derived from other input cols, and derive them when initialising the class instead: obs_id, t_since_last_exp, t_since_min_date
add method get_stan_data so that the user can see the arguments passed to the model (useful for debugging)
remove unused cens_me_idx argument
censoring indicator amended to have 3 levels: -1 is lower, 0 none, 1 upper
add check that required data columns are present when initialising the model

Also updates the data vignette to define the input variables. The simplified input format can be seen here:

epikinetics/vignettes/data.Rmd

Line 17 in 8632ba8

# Input data

codecov · 2024-10-02T15:20:08Z

Codecov Report

Attention: Patch coverage is 97.61905% with 1 line in your changes missing coverage. Please review.

Project coverage is 97.40%. Comparing base (3960395) to head (8632ba8).
Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
R/biokinetics.R	96.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #11      +/-   ##
==========================================
- Coverage   97.58%   97.40%   -0.18%     
==========================================
  Files           7        7              
  Lines         373      386      +13     
==========================================
+ Hits          364      376      +12     
- Misses          9       10       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hillalex · 2024-10-03T11:54:50Z

R/biokinetics.R

+      private$data[, `:=`(titre_type_num = as.numeric(as.factor(titre_type)),
+                          obs_id = seq_len(.N))]
+      if (time_type == "relative") {
+        private$data[, t_since_last_exp := as.integer(date - last_exp_date, units = "days")]
+      } else {
+        private$data[, t_since_min_date := as.integer(date - min(date), units = "days")]


creating cols titre_type_num, obs_id and t_since_x rather than requiring user to pass them in - basically moving more of the data processing logic into this package, so that the required inputs are as simple as possible

hillalex · 2024-10-03T11:56:19Z

vignettes/data.Rmd

+### value
+The value of the titre
+### censored
+Whether this observation should be censored: -1 for lower, 1 for upper, 0 for none


In the spirit of moving more data processing into the package, we should probably actually take censoring limits as arguments, rather than requiring this as a column in the input data?

Yeah, this sounds like a great idea. I think it would be relatively common to have censored data below and/or above but not necessarily both. It will also be relatively common to have data which just isn't censored at all. So I think being flexible around this is definitely the right approach. Allowing the users to specify their own censored limits is certaintly a good idea, given that different assays are calibrated differently and not necessarily standardised.

thimotei · 2024-10-03T12:38:07Z

This all looks great to me. The only additional thing I would say it could be sensible to add would be another column for the timings of the titre dtaw and/or their last exposure. We require calendar dates currently, which in all likelihood is the format the data will be in. But I do think it could be possible for their data to already be relative. So time = 0 could be their last exposure and some positive number after that could be when their titres were taken. If colleagues wisg to use this package with already processed/published data (quite likely I think!), or publicly available data, then the data is likely to be in this relative form, instead of calendar dates. Dates are quite sensitive in terms of identifiability, so allowing for these sorts of timings would be great. I'm a little unsure how to offer either one of these options, but I do think if that is easy to do, it would be worth it! Apart from that, it all looks great.

hillalex · 2024-10-03T15:46:52Z

This all looks great to me. The only additional thing I would say it could be sensible to add would be another column for the timings of the titre dtaw and/or their last exposure. We require calendar dates currently, which in all likelihood is the format the data will be in. But I do think it could be possible for their data to already be relative. So time = 0 could be their last exposure and some positive number after that could be when their titres were taken. If colleagues wisg to use this package with already processed/published data (quite likely I think!), or publicly available data, then the data is likely to be in this relative form, instead of calendar dates. Dates are quite sensitive in terms of identifiability, so allowing for these sorts of timings would be great. I'm a little unsure how to offer either one of these options, but I do think if that is easy to do, it would be worth it! Apart from that, it all looks great.

This is a really helpful insight, thanks. I think I'll merge this PR and add support for relative time data on a new branch to make it easier to review.

hillalex added 3 commits October 2, 2024 16:08

simplify and documetn input data format

c954101

reorder cols

051ca6a

fix comment alignment

ec279a1

hillalex added 5 commits October 2, 2024 16:27

validate presence of required cols

cf94cd1

expose stan data

863de20

merged main

4269ce6

add censoring indicator to data vignette

b398d09

remove unused censoring level

8632ba8

hillalex commented Oct 3, 2024

View reviewed changes

hillalex merged commit 9a808d2 into main Oct 3, 2024
8 checks passed

hillalex mentioned this pull request Oct 3, 2024

Simplify input data format #6

Closed

hillalex mentioned this pull request Oct 15, 2024

Bug in conversion of titre types to numbers #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplify input data format #11

simplify input data format #11

hillalex commented Oct 2, 2024 •

edited

Loading

codecov bot commented Oct 2, 2024 •

edited

Loading

hillalex Oct 3, 2024

hillalex Oct 3, 2024

thimotei Oct 3, 2024

thimotei commented Oct 3, 2024

hillalex commented Oct 3, 2024

simplify input data format #11

simplify input data format #11

Conversation

hillalex commented Oct 2, 2024 • edited Loading

codecov bot commented Oct 2, 2024 • edited Loading

Codecov Report

hillalex Oct 3, 2024

Choose a reason for hiding this comment

hillalex Oct 3, 2024

Choose a reason for hiding this comment

thimotei Oct 3, 2024

Choose a reason for hiding this comment

thimotei commented Oct 3, 2024

hillalex commented Oct 3, 2024

hillalex commented Oct 2, 2024 •

edited

Loading

codecov bot commented Oct 2, 2024 •

edited

Loading