Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: issue with overlapping age groups in agg function #66

Open
hcomfo95 opened this issue Feb 12, 2021 · 2 comments
Open

BUG: issue with overlapping age groups in agg function #66

hcomfo95 opened this issue Feb 12, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@hcomfo95
Copy link
Contributor

Describe the bug
The agg function produces this error when the overlapping age group (age start =0, age end = 11) is included in the dataset.

Error in subtrees[[i]]: subscript out of bounds

It would be nice if the error was more informative. I resolved the error by dropping the age group since it is not needed, but it would be nice if the function did that automatically.

To Reproduce

dt <- data.table(age_group_id = c(7:16, 322),
                 nid = rep(234279),
                 underlying_nid = rep(NA),
                 ihme_loc_id = rep("BRA"),
                 year_id = rep(2000),
                 sex = rep("both"),
                 births_reported = c(28973, 721564, 998526, 720342, 443512, 214808, 55665, 4690, 93, 20, 0),
                 age_start = c(seq(10, 55, 5), 0),
                 age_end = c(seq(15, 60, 5), 11),
                 unique_identifier = rep("234279_NA_BRA_2000_both"))

age_specific_births_reported <- copy(dt)

gbd_year <- 2020

age_map <- mortdb::get_age_map(gbd_year = gbd_year, type = "all")

age_map_10_54 <- age_map[age_group_id == 169, c("age_group_years_start", "age_group_years_end")]
colnames(age_map_10_54) <- c("age_start", "age_end")

value_cols <- "births_reported"
id_cols <- names(age_specific_births_reported)[!names(age_specific_births_reported) %in% value_cols]

age_specific_agg_age_10_54 <- data.table()

for (i in unique(age_specific_births_reported$unique_identifier)) {
  
  temp <- age_specific_births_reported[unique_identifier == i, ]
  
  temp_agg <- hierarchyUtils::agg(
    dt = temp,
    id_cols = id_cols,
    value_cols = value_cols,
    col_stem = "age",
    col_type = "interval",
    mapping = age_map_10_54,
    missing_dt_severity = "none",
    present_agg_severity = "skip",
    overlapping_dt_severity = "stop"
  )
  
  age_specific_agg_age_10_54 <- rbind(age_specific_agg_age_10_54, temp_agg)
  
  temp_agg <- NULL
  
}

@hcomfo95 hcomfo95 added the bug Something isn't working label Feb 12, 2021
@hcomfo95 hcomfo95 changed the title BUG: BUG: issue with overlapping age groups in agg function Feb 12, 2021
@chacalle
Copy link
Collaborator

So the problem here was that age_group_id was included in the input data set. If that is removed the overlapping intervals are correctly found

dt <- data.table(age_group_id = c(7:16, 322),
                 nid = rep(234279),
                 underlying_nid = rep(NA),
                 ihme_loc_id = rep("BRA"),
                 year_id = rep(2000),
                 sex = rep("both"),
                 births_reported = c(28973, 721564, 998526, 720342, 443512, 214808, 55665, 4690, 93, 20, 0),
                 age_start = c(seq(10, 55, 5), 0),
                 age_end = c(seq(15, 60, 5), 11),
                 unique_identifier = rep("234279_NA_BRA_2000_both"))

age_specific_births_reported <- copy(dt)


gbd_year <- 2020

age_map <- demInternal::get_age_map(gbd_year = gbd_year, type = "all")

age_map_10_54 <- age_map[age_group_id == 169, c("age_start", "age_end")]


age_specific_agg_age_10_54 <- data.table()

i <- "234279_NA_BRA_2000_both"
temp <- age_specific_births_reported[unique_identifier == i, ]
temp[, age_group_id := NULL]
value_cols <- "births_reported"
id_cols <- names(temp)[!names(temp) %in% value_cols]

temp_agg <- hierarchyUtils::agg(
  dt = temp,
  id_cols = id_cols,
  value_cols = value_cols,
  col_stem = "age",
  col_type = "interval",
  mapping = age_map_10_54,
  missing_dt_severity = "none",
  present_agg_severity = "skip",
  overlapping_dt_severity = "stop"
)
Aggregating age
Collapsing age to the most detailed common set of intervals
 Error in hierarchyUtils::agg(dt = temp, id_cols = id_cols, value_cols = value_cols,  : 
  empty_dt : Some overlapping intervals were identified in `dt`.
These will be automatically dropped.
      nid underlying_nid ihme_loc_id year_id  sex       unique_identifier age_start age_end
1: 234279             NA         BRA    2000 both 234279_NA_BRA_2000_both         0      11
2: 234279             NA         BRA    2000 both 234279_NA_BRA_2000_both        10      15
                           issue
1: overlapping intervals present
2: overlapping intervals present
[1] FALSE 

@chacalle
Copy link
Collaborator

chacalle commented Sep 17, 2021

It gets caught right here.

I'm not sure of a good generalized way to catch that there is an issue with dt and id_cols

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants