Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd Sepsis Labels with eICU #66

Open
DanielBrkr opened this issue Apr 22, 2024 · 4 comments
Open

Odd Sepsis Labels with eICU #66

DanielBrkr opened this issue Apr 22, 2024 · 4 comments

Comments

@DanielBrkr
Copy link

Hi there,

and first of all, thanks for the great work, I just noticed some oddities with regard to Sepsis 3 labels

generated by

sepsis_data <- sep3(sofa_data, si_data, si_window = "any", si_lwr = hours(48L),
                    si_upr = hours(24L), keep_components = TRUE,
                    interval = mins(15L)
)

for the eICU dataset.

There are some assigned labels for which the SI-window seems to be violated. e.g. 144 hours between the last SI-event and the determining SOFA score increase, considering the Sepsis-3 requirements in the documentation.

Is this an expected artefact due the way the labels are generated for eICU under the hood or is there something else off that might need to be taken care of?

See the plot down below for an example:

odd_sepsis_label

Thanks again!

@dplecko
Copy link
Member

dplecko commented Jun 3, 2024

Hi,

Thanks for the question. It is difficult to say exactly what is going on without the full code replicating this issue. My guess would be that you are plotting only the first time of suspected infection, and there is possibly a later one which is closer to the SOFA increase that triggers the Sepsis-3 label.

If the SI time you indicated is the only SI time for this individual, then there is something unusual happening. If you share the full code, I am happy to take a look.

@DanielBrkr
Copy link
Author

Sure no problem, here's the corresponding R code from my python code. I hope it's not too unidiomatic, that's basically my first time using the R language at all.

library(ricu)  
library(units)  
library(ggplot2)  
library(dplyr)  
  
ricu::import_src("eicu")  
ricu::attach_src("eicu")  
ricu::src_data_avail()  

sofa_data <- ricu::load_concepts("sofa",  
                                 "eicu",  
                                 keep_components = TRUE,  
                                 interval = mins(15L)  
)  
  
si_data <- ricu::load_concepts("susp_inf", "eicu",  
                               abx_min_count = 2L,  
                               positive_cultures = TRUE,  
                               si_mode = "or",  
                               keep_components = TRUE,  
                               interval = mins(15L)  
)  
  
sepsis_data <- sep3(sofa_data, si_data, si_window = "any", si_lwr = hours(48L),  
                    si_upr = hours(24L), keep_components = TRUE,  
                    interval = mins(15L)  
)  
  
id_sample <- 3166218  
  
si_sample <- si_data %>% filter(patientunitstayid == id_sample)  
sofa_sample <- sofa_data %>% filter(patientunitstayid == id_sample)  
sepsis_sample <- sepsis_data %>% filter(patientunitstayid == id_sample)  
  
si_sample$susp_inf <- as.numeric(si_sample$susp_inf)  
sepsis_sample$sep3 <- as.numeric(sepsis_sample$sep3)  
  
ggplot() +  
  geom_point(data = sofa_sample, aes(x = labresultoffset, y = sofa), color = "blue", size = 2, shape = 16, alpha = 0.6) +  
  geom_point(data = si_sample, aes(x = infusionoffset, y = susp_inf), color = "black", size = 4, shape = 15) +  
  geom_point(data = sepsis_sample, aes(x = labresultoffset, y = sep3), color = "magenta", size = 6, shape = 17) +  
  labs(x = "Time (hours)", y = "Values") +  
  scale_x_continuous(labels = function(x) paste0(x / 60, "h")) + # offsets are provided as minutes afaik  
  theme_minimal()

There's in fact, only one data point related to the suspected infection, at least for this patient id, that's the dataframe:

patientunitstayid infusionoffset abx_time samp_time susp_inf
1 3166218 -225 -225 NA 1,00000

Is there anything else you need?

@dplecko
Copy link
Member

dplecko commented Jun 5, 2024

Thanks for raising this issue. There is indeed a bug in ricu. However, note that if you work with hourly intervals, the issue does not appear (patient 3166218 does not have a sepsis event). This may be a preferred solution for now.

For concreteness (and discussion with @nbenn) here is a reproducible example with a reasonable amount of RAM:

si <- load_concepts("susp_inf", "eicu",  
                    abx_min_count = 2L,  
                    positive_cultures = TRUE,  
                    si_mode = "or",  
                    keep_components = TRUE,
                    interval = mins(15L)
                    )
pids <- c(unique(id_col(si))[1:200], 3166218)
sofa <- load_concepts("sofa", "eicu", keep_components = TRUE,
                      interval = mins(15L),
                      patient_ids = pids)

sep3(sofa[patientunitstayid == 3166218], si[patientunitstayid == 3166218], 
     si_window = "any", keep_components = TRUE,
     interval = mins(15L))

The issue arises from L162-165 in callback-sep3.R. Here, a difference get(index_var(susp)) - si_lwr is taken, and since index_var is in minutes, and si_lwr is in hours, the difference is cast to seconds. Then in the non-equi join in L173-175 the comparison join_time1 >= si_lwr makes no sense because the units are different.

It seems that converting si_lwr and si_upr to minutes resolves the issue (since a subtraction of quantities with equal units does not result in casting to seconds).

@DanielBrkr
Copy link
Author

Thanks a lot for the clarification and the suggested workaround!

I gave it a try just now and noticed something else in a patient with a valid sepsis label according to the sep3 function with an hourly interval.

It seem like there's a rounding error when comparing the labresultoffset and abx_time from the hourly interval result, with the result of the workaround (for minutes), at least from my intuition as I would expect the label to be "forward filled" instead of "back filled". Maybe related to chopping in an unsafe cast somewhere else? Or is this intended behaviour?

Results with the 15 min workaround

patientunitstayid delta_sofa labresultoffset abx_time samp_time sep3
141436 3 45 mins 585 mins NA mins TRUE

Result with hourly interval

patientunitstayid delta_sofa labresultoffset abx_time samp_time sep3
141436 2 0 hours 9 hours NA hours TRUE

Here's the code I adapted from you to reproduce it:

15 Min Interval Workaround

si <- load_concepts("susp_inf", "eicu",
                     abx_min_count = 2L,
                     positive_cultures = TRUE,
                     si_mode = "or",
                     keep_components = TRUE,
                     interval = mins(15L))

pids <- c(unique(id_col(si))[1:200], 141436)

sofa <- load_concepts("sofa", "eicu", keep_components = TRUE,
                       interval = mins(15L),
                       patient_ids = pids)
                       
sep3(sofa[patientunitstayid == 141436], si[patientunitstayid == 141436],
     si_window = "any",
     si_lwr = mins(2880L), # 60 * 48 = 2880
     si_upr = mins(1440L), # 60 * 24 = 1440
     keep_components = TRUE,
     interval = mins(15L)
)

Hourly Interval

si <- load_concepts("susp_inf", "eicu",
                     abx_min_count = 2L,
                     positive_cultures = TRUE,
                     si_mode = "or",
                     keep_components = TRUE,
                     interval = hours(1L))

pids <- c(unique(id_col(si))[1:200], 141436)

sofa <- load_concepts("sofa", "eicu", keep_components = TRUE,
                       interval = hours(1L),
                       patient_ids = pids)
                       
sep3(sofa[patientunitstayid == 141436], si[patientunitstayid == 141436],
      si_window = "any", keep_components = TRUE,
      interval = hours(1L))

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants