Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_de failing with on_disk models #33

Open
NathanSiemers opened this issue Mar 30, 2022 · 4 comments
Open

test_de failing with on_disk models #33

NathanSiemers opened this issue Mar 30, 2022 · 4 comments

Comments

@NathanSiemers
Copy link

NathanSiemers commented Mar 30, 2022

Hello Constantin,

Thank you for this code, I've been using and learning more about it for several months. I'm working with some large data sets and using glmGamPoi to perform modeling and hypothesis testing.

I've been able to create full models from HDF5Array input data sets and on_disk = TRUE. I did need to adjust the chunk size params to get this to occur in finite time. However, the on-disk version of the model output cannot be queried with test_de. Performing such test gives the error:

Error in combine_size_factors_and_offset(offset, size_factors, Y, verbose = verbose) :
length(offset) == 1 || length(offset) == n_samples is not TRUE

(seems to be coming from estimate_size_factors.R)

If I run the same model with a realized input matrix and on_disk = FALSE, everything works as expected. I wonder if test_de can't determine the right dimensions or related parameters for on_disk models and is instead throwing this error?

It could be a general bug, it could be a problem that only shows up when one alters default H5 temp file paths, etc? But I've looked, and I think the on-disk model objects have hard-coded locations for data contained within them.... Hmm.

Attached is the log file (docx). You can get a full tgz of the code and input data at the link below. I've tried to make this an MRE, but even very truncated the singlecellexperiment file is a little big.

Thanks, Nathan Siemers
mre.glmgampoi.bug.docx

tgz file: https://drive.google.com/file/d/1fYN4F7TtFpwgC6yqYZDP_rDHBiV4XU5H/view?usp=sharing

@NathanSiemers
Copy link
Author

btw, removing the offset parameter in the glm_gp call doesn't change anything.

@const-ae
Copy link
Owner

const-ae commented Apr 1, 2022

Hey, sorry for the delay. I was at a workshop this week. I did take a briefly this afternoon while waiting for my flight and tried the code below to reproduce the issue, but my version finished successfully.

library(glmGamPoi)
model_matrix <- cbind(1, rnorm(5))
true_Beta <- cbind(rnorm(n = 30), rnorm(n = 30, mean = 3))
sf <- exp(rnorm(n = 5, mean = 0.7))
model_matrix
#>      [,1]       [,2]
#> [1,]    1  1.1480341
#> [2,]    1 -0.6869605
#> [3,]    1 -2.1089572
#> [4,]    1  0.1170987
#> [5,]    1  1.0913436
Y <- matrix(rnbinom(n = 30 * 5, mu = sf * exp(true_Beta %*% t(model_matrix)), size = 1/2.4),
            nrow = 30, ncol = 5)
options(DelayedArray.block.size = 1e9)
DelayedArray::setAutoBlockSize(1e9)
#> automatic block size set to 1e+09 bytes (was 1e+08)
DelayedArray::setAutoBlockShape('last-dim-grows-first')
#> automatic block shape set to "last-dim-grows-first" (was "hypercube")
Y_h5 <- HDF5Array::writeHDF5Array(Y)

lmf <- glm_gp(
  Y_h5,
  model_matrix,
  on_disk = TRUE,
  verbose = TRUE,
  offset = 0
)
#> Calculate Size Factors (normed_sum)
#> Make initial dispersion estimate
#> Make initial beta estimate
#> Estimate beta
#> Estimate dispersion
#> Fit dispersion trend
#> Shrink dispersion estimates
#> Estimate beta again
compare.treat = test_de( lmf, c(0,1), verbose = TRUE )
#> Fit reduced model
#> Calculate quasi likelihood ratio
#> Prepare results

Created on 2022-04-01 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.1 (2021-08-10)
#>  os       macOS Big Sur 10.16         
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2022-04-01                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version  date       lib source        
#>  backports              1.2.1    2020-12-09 [1] CRAN (R 4.1.0)
#>  beachmat               2.8.1    2021-08-12 [1] Bioconductor  
#>  Biobase                2.52.0   2021-05-19 [1] Bioconductor  
#>  BiocGenerics           0.38.0   2021-05-19 [1] Bioconductor  
#>  bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.1.0)
#>  cli                    3.0.1    2021-07-17 [1] CRAN (R 4.1.0)
#>  crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.1.0)
#>  DelayedArray           0.18.0   2021-05-19 [1] Bioconductor  
#>  DelayedMatrixStats     1.14.3   2021-08-26 [1] Bioconductor  
#>  digest                 0.6.27   2020-10-24 [1] CRAN (R 4.1.0)
#>  ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate               0.14     2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi                  0.5.0    2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.1.0)
#>  fs                     1.5.0    2020-07-31 [1] CRAN (R 4.1.0)
#>  GenomeInfoDb           1.28.4   2021-09-05 [1] Bioconductor  
#>  GenomeInfoDbData       1.2.6    2021-09-09 [1] Bioconductor  
#>  GenomicRanges          1.44.0   2021-05-19 [1] Bioconductor  
#>  glmGamPoi            * 1.4.0    2021-05-19 [1] Bioconductor  
#>  glue                   1.4.2    2020-08-27 [1] CRAN (R 4.1.0)
#>  HDF5Array              1.20.0   2021-05-19 [1] Bioconductor  
#>  highr                  0.9      2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools              0.5.2    2021-08-25 [1] CRAN (R 4.1.0)
#>  IRanges                2.26.0   2021-05-19 [1] Bioconductor  
#>  knitr                  1.34     2021-09-09 [1] CRAN (R 4.1.1)
#>  lattice                0.20-44  2021-05-02 [1] CRAN (R 4.1.1)
#>  lifecycle              1.0.0    2021-02-15 [1] CRAN (R 4.1.0)
#>  magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.1.0)
#>  Matrix                 1.3-4    2021-06-01 [1] CRAN (R 4.1.1)
#>  MatrixGenerics         1.5.3    2021-09-09 [1] Bioconductor  
#>  matrixStats            0.60.1   2021-08-23 [1] CRAN (R 4.1.0)
#>  pillar                 1.6.2    2021-07-29 [1] CRAN (R 4.1.0)
#>  pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.1.0)
#>  Rcpp                   1.0.7    2021-07-07 [1] CRAN (R 4.1.0)
#>  RCurl                  1.98-1.4 2021-08-17 [1] CRAN (R 4.1.0)
#>  reprex                 2.0.1    2021-08-05 [1] CRAN (R 4.1.0)
#>  rhdf5                  2.36.0   2021-05-19 [1] Bioconductor  
#>  rhdf5filters           1.4.0    2021-05-19 [1] Bioconductor  
#>  Rhdf5lib               1.14.2   2021-07-06 [1] Bioconductor  
#>  rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.1.0)
#>  rmarkdown              2.11     2021-09-14 [1] CRAN (R 4.1.1)
#>  rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.1.0)
#>  S4Vectors              0.30.0   2021-05-19 [1] Bioconductor  
#>  sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.1.0)
#>  sparseMatrixStats      1.4.2    2021-08-08 [1] Bioconductor  
#>  stringi                1.7.4    2021-08-25 [1] CRAN (R 4.1.0)
#>  stringr                1.4.0    2019-02-10 [1] CRAN (R 4.1.0)
#>  styler                 1.5.1    2021-07-13 [1] CRAN (R 4.1.0)
#>  SummarizedExperiment   1.22.0   2021-05-19 [1] Bioconductor  
#>  tibble                 3.1.4    2021-08-25 [1] CRAN (R 4.1.0)
#>  utf8                   1.2.2    2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.1.0)
#>  withr                  2.4.2    2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun                   0.26     2021-09-14 [1] CRAN (R 4.1.1)
#>  XVector                0.32.0   2021-05-19 [1] Bioconductor  
#>  yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.1.0)
#>  zlibbioc               1.38.0   2021-05-19 [1] Bioconductor  
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

I will have to take a more detailed look what's the difference between your code/data and my version next week. But thanks for making such a detailed bug report, I am sure we will able to figure out what is going wrong here :)

Best, Constantin

@seanken
Copy link

seanken commented Dec 1, 2022

Just wondering if there has been any updates on this issue? I have been having a similiar issue as well. Worth noting: when I pass a matrix (sparse or not) to glm_gp with on_disk=T I have the same issue (think it is suppose to make it a HDF5Array internally, yes?), while if I first make the HDF5Array object then feed it in I do not have the issue, so that work around works for me (in case others need such a work around).

@const-ae
Copy link
Owner

Hi Sean,

thanks for sharing your experience and the work-around. If you have a reproducible example (https://reprex.tidyverse.org/), I will try to take a look and figure out why the internal conversion fails.

Until then, I endorse your suggestion to manually create a HDF5Array object and call glmGamPoi with that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants