Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check multiplicative scaling algorithm with potential option for bounding between 0 and 1 #39

Open
krpaulson opened this issue Jun 2, 2020 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@krpaulson
Copy link
Collaborator

image

@krpaulson
Copy link
Collaborator Author

In this example, when I scale I get px > 1, which should not be possible because px is a probability.

It may be that we need to modify our multiplicative scaling algorithm such that scaled values cannot be outside [0,1] when specified.

@krpaulson krpaulson assigned krpaulson and chacalle and unassigned krpaulson Jun 2, 2020
@krpaulson krpaulson added the enhancement New feature or request label Jun 2, 2020
@chacalle
Copy link
Collaborator

chacalle commented Jun 4, 2020

Do you have example data I can use to replicate the issue?

@krpaulson
Copy link
Collaborator Author

krpaulson commented Dec 14, 2020

@chacalle Here's an example dataset:


# setup data
dt <- data.table(
  sex = "female",
  age_start = c(0, 0, 0.01917808, 0.07671233, 1, 2, 0, 1, 0.07671233, 0.50136986),
  age_end = c(5, 0.01917808, 0.07671233, 1, 5, 5, 0.07671233, 2, 0.50136986, 1),
  qx = c(0.00911375112673342, 0.00273191265024416, 0.000879854310413528,
         0.00210783390546553, 0.0019017531756225, 0.00130828120408238, 
         0.00397032909728481, 0.000916291989584039, 0.00244494910339115,
         0.00030509788601267)
)

# run scale function
dt_output <- hierarchyUtils::scale(
  dt,
  id_cols = c("sex", "age_start", "age_end"),
  value_cols = "qx",
  col_stem = "age",
  col_type = "interval",
  agg_function = prod
)

# compare
dt_compare <- merge(dt, dt_output, by = c("sex", "age_start", "age_end"))

@erinamay FYI

@krpaulson
Copy link
Collaborator Author

Oh you know what? That's actually my fault for not converting to px first.

@chacalle
Copy link
Collaborator

@krpaulson So this can be closed?

@krpaulson
Copy link
Collaborator Author

No I think this is still an issue, just not with that example. At some point I'll transcribe the screenshot above so that we can test with it.

@krpaulson
Copy link
Collaborator Author

Okay, here's a complete example:

# try demCore function (uses hierarchyUtils::scale with multiplicative scaling of px)
ch <- data.table(
  sex = rep("male", 3),
  age_start = c(1, 1, 2),
  age_end = c(2, 5, 5),
  qx = c(0.08, 0.013, 0.007)
)
ch <- demCore::scale_qx(ch, id_cols = c("sex", "age_start", "age_end"))

# manually solve multiplicative scaling using algorithm presumed in
# `hierarchyUtils` function. Confirm this gets the same results.
pch <- (1-0.013)
pcha <- (1-0.080)
pchb <- (1-0.007)
scalar <- sqrt(pch / (pcha * pchb))
pcha <- pcha * scalar
pchb <- pchb * scalar
qcha <- (1-pcha)
qchb <- (1-pchb)

# try conditional probability method
ch <- data.table(
  sex = rep("male", 3),
  age_name = c("cha", "ch", "chb"),
  qx = c(0.08, 0.013, 0.007)
)
ch <- dcast(ch, sex ~ age_name, value.var = "qx")
ch[, prob_cha := cha / ch]
ch[, prob_chb := chb / ch]
ch[, scale := prob_cha + prob_chb]
ch[, prob_cha := prob_cha / scale]
ch[, prob_chb := prob_chb / scale]
ch[, cha := ch * prob_cha]
ch[, chb := ch * prob_chb / (1 - cha)]
ch[, c("prob_cha", "prob_chb", "scale") := NULL]
ch <- melt(
  ch,
  measure.vars = c("ch", "cha", "chb"),
  variable.name = "age_name",
  value.name = "qx"
)
# check
testthat::test_that("scaling worked", {
  testthat::expect_equivalent(
    1 - (1 - ch[age_name == "cha", qx]) * (1 - ch[age_name == "chb", qx]),
    ch[age_name == "ch", qx],
    tolerance = 0.000000001
  )
})

Our multiplicative scaling is giving a valid solution, in that the results form the product we'd like. However, to get there, a scalar is being selected that results in qx and px values outside of the [0,1] bounds for probabilities.

I've also included a toy example of scaling we've done for qx before which puts the values in conditional probability space. In this example, this means probability of death in age group [1,2) conditional on both survival to 1st birthday and death before 5th birthday, and separately probability of death in age group [2,5) conditional on both survival to 1st birthday and death before 5th birthday. These conditional probabilities should scale to 1.

I think we should probably modify scale_qx to use this conditional probability method, but it might require a bit of work to set that up.

@chacalle
Copy link
Collaborator

Cool thats a helpful example. We could maybe address this at the same time as generalizing scale_qx to scale_lt ihmeuw-demographics/demCore#50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants