Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_ci.lmer chokes on "big data" #32

Open
jthaman opened this issue Mar 9, 2018 · 0 comments
Open

add_ci.lmer chokes on "big data" #32

jthaman opened this issue Mar 9, 2018 · 0 comments

Comments

@jthaman
Copy link
Owner

jthaman commented Mar 9, 2018

I'm finding that we cannot use add_ci.lmer for "big data". I tried an example from the mermod vignette with 200,000 observations and found that R couldn't put the new data frame into memory. Here's the example I tried:

## linear example

x_gen_mermod <- function(ng = 8, nw = 5){
  n <- ng * nw
  x2 <- runif(n)
  group <- rep(as.character(1:ng), each = nw)
  return(tibble::tibble(x2 = x2,
                        group = group))
}

mm_pipe <- function(tb, ...){
  model.matrix(data = tb, ...)
}

get_validation_set <- function(tb, sigma, sigmaG, beta, includeRanef, groupIntercepts){
  vm <- sample_n(tb, 5, replace = F)[rep(1:5, each = 100), ]
  vf <- bind_rows(vm, tb) %>%
    select(-group) %>%
    mm_pipe(~.*.)
  vf <- vf[1:500, ]
  vGroups <- if(!includeRanef) rnorm(500, 0, sigmaG) else groupIntercepts[as.numeric(vm$group)]
  vm[["y"]] <- vf %*% beta + vGroups + rnorm(500, mean = 0, sd = sigma)
  vm
}

y_gen_mermod <- function(tb, sigma = 1, sigmaG = 1, delta = 1, includeRanef = FALSE, validationPoints = FALSE){
  groupIntercepts <- rnorm(length(unique(tb$group)), 0, sigmaG)
  tf <- tb %>%
    dplyr::select(-group) %>%
    mm_pipe(~.*.)
  beta <- rep(delta, ncol(tf))
  if(validationPoints)  {
    vm <- get_validation_set(tb, sigma, sigmaG, beta, includeRanef, groupIntercepts)
  }
  tb[["y"]] <- tf %*% beta + groupIntercepts[as.numeric(tb$group)] + rnorm(nrow(tb), mean = 0, sd = sigma)
  tb[["truth"]] <- tf %*% beta + groupIntercepts[as.numeric(tb$group)] * (includeRanef)
  if(validationPoints) return(list(tb = tb, vm = vm)) else return(tb)
}


tb <- x_gen_mermod(10, 20000) %>%
    y_gen_mermod()

fit2 <- lmer(y ~ x2 + (1|group) , data = tb)

tb %>% add_ci(fit2, type = "parametric", includeRanef = TRUE, names = c("LCB", "UCB"))

Lmer works just fine on an example data set this large, but ciTools chokes and spits out

Error: cannot allocate vector of size 298.0 Gb

We need to re-examine how we are storing things in memory and see if we can do something more efficient. I'm not sure if this bug affects the other methods as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant