Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify input data format #11

Merged
merged 8 commits into from
Oct 3, 2024
Merged

simplify input data format #11

merged 8 commits into from
Oct 3, 2024

Conversation

hillalex
Copy link
Member

@hillalex hillalex commented Oct 2, 2024

  • rename stan_id to pid to make the interface clearer and less implementation specific
  • rename titre to value to be more in line with other seroanalytics packages and to distinguish more clearly from titre_type
  • remove all model-specific variables from the input data that can be derived from other input cols, and derive them when initialising the class instead: obs_id, t_since_last_exp, t_since_min_date
  • add method get_stan_data so that the user can see the arguments passed to the model (useful for debugging)
  • remove unused cens_me_idx argument
  • censoring indicator amended to have 3 levels: -1 is lower, 0 none, 1 upper
  • add check that required data columns are present when initialising the model

Also updates the data vignette to define the input variables. The simplified input format can be seen here:

# Input data

Copy link

codecov bot commented Oct 2, 2024

Codecov Report

Attention: Patch coverage is 97.61905% with 1 line in your changes missing coverage. Please review.

Project coverage is 97.40%. Comparing base (3960395) to head (8632ba8).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
R/biokinetics.R 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #11      +/-   ##
==========================================
- Coverage   97.58%   97.40%   -0.18%     
==========================================
  Files           7        7              
  Lines         373      386      +13     
==========================================
+ Hits          364      376      +12     
- Misses          9       10       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +254 to +259
private$data[, `:=`(titre_type_num = as.numeric(as.factor(titre_type)),
obs_id = seq_len(.N))]
if (time_type == "relative") {
private$data[, t_since_last_exp := as.integer(date - last_exp_date, units = "days")]
} else {
private$data[, t_since_min_date := as.integer(date - min(date), units = "days")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating cols titre_type_num, obs_id and t_since_x rather than requiring user to pass them in - basically moving more of the data processing logic into this package, so that the required inputs are as simple as possible

### value
The value of the titre
### censored
Whether this observation should be censored: -1 for lower, 1 for upper, 0 for none
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of moving more data processing into the package, we should probably actually take censoring limits as arguments, rather than requiring this as a column in the input data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this sounds like a great idea. I think it would be relatively common to have censored data below and/or above but not necessarily both. It will also be relatively common to have data which just isn't censored at all. So I think being flexible around this is definitely the right approach. Allowing the users to specify their own censored limits is certaintly a good idea, given that different assays are calibrated differently and not necessarily standardised.

@thimotei
Copy link
Contributor

thimotei commented Oct 3, 2024

This all looks great to me. The only additional thing I would say it could be sensible to add would be another column for the timings of the titre dtaw and/or their last exposure. We require calendar dates currently, which in all likelihood is the format the data will be in. But I do think it could be possible for their data to already be relative. So time = 0 could be their last exposure and some positive number after that could be when their titres were taken. If colleagues wisg to use this package with already processed/published data (quite likely I think!), or publicly available data, then the data is likely to be in this relative form, instead of calendar dates. Dates are quite sensitive in terms of identifiability, so allowing for these sorts of timings would be great. I'm a little unsure how to offer either one of these options, but I do think if that is easy to do, it would be worth it! Apart from that, it all looks great.

@hillalex
Copy link
Member Author

hillalex commented Oct 3, 2024

This all looks great to me. The only additional thing I would say it could be sensible to add would be another column for the timings of the titre dtaw and/or their last exposure. We require calendar dates currently, which in all likelihood is the format the data will be in. But I do think it could be possible for their data to already be relative. So time = 0 could be their last exposure and some positive number after that could be when their titres were taken. If colleagues wisg to use this package with already processed/published data (quite likely I think!), or publicly available data, then the data is likely to be in this relative form, instead of calendar dates. Dates are quite sensitive in terms of identifiability, so allowing for these sorts of timings would be great. I'm a little unsure how to offer either one of these options, but I do think if that is easy to do, it would be worth it! Apart from that, it all looks great.

This is a really helpful insight, thanks. I think I'll merge this PR and add support for relative time data on a new branch to make it easier to review.

@hillalex hillalex merged commit 9a808d2 into main Oct 3, 2024
8 checks passed
@hillalex hillalex mentioned this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants