-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplify input data format #11
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11 +/- ##
==========================================
- Coverage 97.58% 97.40% -0.18%
==========================================
Files 7 7
Lines 373 386 +13
==========================================
+ Hits 364 376 +12
- Misses 9 10 +1 ☔ View full report in Codecov by Sentry. |
private$data[, `:=`(titre_type_num = as.numeric(as.factor(titre_type)), | ||
obs_id = seq_len(.N))] | ||
if (time_type == "relative") { | ||
private$data[, t_since_last_exp := as.integer(date - last_exp_date, units = "days")] | ||
} else { | ||
private$data[, t_since_min_date := as.integer(date - min(date), units = "days")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating cols titre_type_num
, obs_id
and t_since_x
rather than requiring user to pass them in - basically moving more of the data processing logic into this package, so that the required inputs are as simple as possible
### value | ||
The value of the titre | ||
### censored | ||
Whether this observation should be censored: -1 for lower, 1 for upper, 0 for none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of moving more data processing into the package, we should probably actually take censoring limits as arguments, rather than requiring this as a column in the input data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this sounds like a great idea. I think it would be relatively common to have censored data below and/or above but not necessarily both. It will also be relatively common to have data which just isn't censored at all. So I think being flexible around this is definitely the right approach. Allowing the users to specify their own censored limits is certaintly a good idea, given that different assays are calibrated differently and not necessarily standardised.
This all looks great to me. The only additional thing I would say it could be sensible to add would be another column for the timings of the titre dtaw and/or their last exposure. We require calendar dates currently, which in all likelihood is the format the data will be in. But I do think it could be possible for their data to already be relative. So time = 0 could be their last exposure and some positive number after that could be when their titres were taken. If colleagues wisg to use this package with already processed/published data (quite likely I think!), or publicly available data, then the data is likely to be in this relative form, instead of calendar dates. Dates are quite sensitive in terms of identifiability, so allowing for these sorts of timings would be great. I'm a little unsure how to offer either one of these options, but I do think if that is easy to do, it would be worth it! Apart from that, it all looks great. |
This is a really helpful insight, thanks. I think I'll merge this PR and add support for relative time data on a new branch to make it easier to review. |
stan_id
topid
to make the interface clearer and less implementation specifictitre
tovalue
to be more in line with other seroanalytics packages and to distinguish more clearly fromtitre_type
obs_id
,t_since_last_exp
,t_since_min_date
get_stan_data
so that the user can see the arguments passed to the model (useful for debugging)cens_me_idx
argumentAlso updates the data vignette to define the input variables. The simplified input format can be seen here:
epikinetics/vignettes/data.Rmd
Line 17 in 8632ba8