-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_de failing with on_disk models #33
Comments
btw, removing the offset parameter in the glm_gp call doesn't change anything. |
Hey, sorry for the delay. I was at a workshop this week. I did take a briefly this afternoon while waiting for my flight and tried the code below to reproduce the issue, but my version finished successfully. library(glmGamPoi)
model_matrix <- cbind(1, rnorm(5))
true_Beta <- cbind(rnorm(n = 30), rnorm(n = 30, mean = 3))
sf <- exp(rnorm(n = 5, mean = 0.7))
model_matrix
#> [,1] [,2]
#> [1,] 1 1.1480341
#> [2,] 1 -0.6869605
#> [3,] 1 -2.1089572
#> [4,] 1 0.1170987
#> [5,] 1 1.0913436
Y <- matrix(rnbinom(n = 30 * 5, mu = sf * exp(true_Beta %*% t(model_matrix)), size = 1/2.4),
nrow = 30, ncol = 5)
options(DelayedArray.block.size = 1e9)
DelayedArray::setAutoBlockSize(1e9)
#> automatic block size set to 1e+09 bytes (was 1e+08)
DelayedArray::setAutoBlockShape('last-dim-grows-first')
#> automatic block shape set to "last-dim-grows-first" (was "hypercube")
Y_h5 <- HDF5Array::writeHDF5Array(Y)
lmf <- glm_gp(
Y_h5,
model_matrix,
on_disk = TRUE,
verbose = TRUE,
offset = 0
)
#> Calculate Size Factors (normed_sum)
#> Make initial dispersion estimate
#> Make initial beta estimate
#> Estimate beta
#> Estimate dispersion
#> Fit dispersion trend
#> Shrink dispersion estimates
#> Estimate beta again
compare.treat = test_de( lmf, c(0,1), verbose = TRUE )
#> Fit reduced model
#> Calculate quasi likelihood ratio
#> Prepare results Created on 2022-04-01 by the reprex package (v2.0.1) Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.1 (2021-08-10)
#> os macOS Big Sur 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2022-04-01
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
#> beachmat 2.8.1 2021-08-12 [1] Bioconductor
#> Biobase 2.52.0 2021-05-19 [1] Bioconductor
#> BiocGenerics 0.38.0 2021-05-19 [1] Bioconductor
#> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
#> cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
#> DelayedArray 0.18.0 2021-05-19 [1] Bioconductor
#> DelayedMatrixStats 1.14.3 2021-08-26 [1] Bioconductor
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
#> GenomeInfoDb 1.28.4 2021-09-05 [1] Bioconductor
#> GenomeInfoDbData 1.2.6 2021-09-09 [1] Bioconductor
#> GenomicRanges 1.44.0 2021-05-19 [1] Bioconductor
#> glmGamPoi * 1.4.0 2021-05-19 [1] Bioconductor
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
#> HDF5Array 1.20.0 2021-05-19 [1] Bioconductor
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
#> IRanges 2.26.0 2021-05-19 [1] Bioconductor
#> knitr 1.34 2021-09-09 [1] CRAN (R 4.1.1)
#> lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.1)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
#> Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.1)
#> MatrixGenerics 1.5.3 2021-09-09 [1] Bioconductor
#> matrixStats 0.60.1 2021-08-23 [1] CRAN (R 4.1.0)
#> pillar 1.6.2 2021-07-29 [1] CRAN (R 4.1.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
#> RCurl 1.98-1.4 2021-08-17 [1] CRAN (R 4.1.0)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
#> rhdf5 2.36.0 2021-05-19 [1] Bioconductor
#> rhdf5filters 1.4.0 2021-05-19 [1] Bioconductor
#> Rhdf5lib 1.14.2 2021-07-06 [1] Bioconductor
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
#> S4Vectors 0.30.0 2021-05-19 [1] Bioconductor
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
#> sparseMatrixStats 1.4.2 2021-08-08 [1] Bioconductor
#> stringi 1.7.4 2021-08-25 [1] CRAN (R 4.1.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
#> styler 1.5.1 2021-07-13 [1] CRAN (R 4.1.0)
#> SummarizedExperiment 1.22.0 2021-05-19 [1] Bioconductor
#> tibble 3.1.4 2021-08-25 [1] CRAN (R 4.1.0)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
#> xfun 0.26 2021-09-14 [1] CRAN (R 4.1.1)
#> XVector 0.32.0 2021-05-19 [1] Bioconductor
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
#> zlibbioc 1.38.0 2021-05-19 [1] Bioconductor
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library I will have to take a more detailed look what's the difference between your code/data and my version next week. But thanks for making such a detailed bug report, I am sure we will able to figure out what is going wrong here :) Best, Constantin |
Just wondering if there has been any updates on this issue? I have been having a similiar issue as well. Worth noting: when I pass a matrix (sparse or not) to glm_gp with on_disk=T I have the same issue (think it is suppose to make it a HDF5Array internally, yes?), while if I first make the HDF5Array object then feed it in I do not have the issue, so that work around works for me (in case others need such a work around). |
Hi Sean, thanks for sharing your experience and the work-around. If you have a reproducible example (https://reprex.tidyverse.org/), I will try to take a look and figure out why the internal conversion fails. Until then, I endorse your suggestion to manually create a |
Hello Constantin,
Thank you for this code, I've been using and learning more about it for several months. I'm working with some large data sets and using glmGamPoi to perform modeling and hypothesis testing.
I've been able to create full models from HDF5Array input data sets and on_disk = TRUE. I did need to adjust the chunk size params to get this to occur in finite time. However, the on-disk version of the model output cannot be queried with test_de. Performing such test gives the error:
Error in combine_size_factors_and_offset(offset, size_factors, Y, verbose = verbose) :
length(offset) == 1 || length(offset) == n_samples is not TRUE
(seems to be coming from estimate_size_factors.R)
If I run the same model with a realized input matrix and on_disk = FALSE, everything works as expected. I wonder if test_de can't determine the right dimensions or related parameters for on_disk models and is instead throwing this error?
It could be a general bug, it could be a problem that only shows up when one alters default H5 temp file paths, etc? But I've looked, and I think the on-disk model objects have hard-coded locations for data contained within them.... Hmm.
Attached is the log file (docx). You can get a full tgz of the code and input data at the link below. I've tried to make this an MRE, but even very truncated the singlecellexperiment file is a little big.
Thanks, Nathan Siemers
mre.glmgampoi.bug.docx
tgz file: https://drive.google.com/file/d/1fYN4F7TtFpwgC6yqYZDP_rDHBiV4XU5H/view?usp=sharing
The text was updated successfully, but these errors were encountered: