Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read SCP sensitive to the order the quantCol rows are provided in the sample annotation file #77

Open
edemmott opened this issue Nov 22, 2024 · 6 comments

Comments

@edemmott
Copy link

I've sent Laurent an email with some more details but in brief, readSCP is sensitive to row order in the sample annotation file. Two files with identical data, but with the rows reordered based on the quantCol give different results:

Example 1, ordered as below gives poor results.
Reporter.intensity.1
Reporter.intensity.10
Reporter.intensity.11
Reporter.intensity.12
Reporter.intensity.13
Reporter.intensity.14
Reporter.intensity.15
Reporter.intensity.16
Reporter.intensity.17
Reporter.intensity.18
Reporter.intensity.2
Reporter.intensity.3
Reporter.intensity.4
Reporter.intensity.5
Reporter.intensity.6
Reporter.intensity.7
Reporter.intensity.8
Reporter.intensity.9

Example 2, ordered below, gives good results.
Reporter.intensity.1
Reporter.intensity.2
Reporter.intensity.3
Reporter.intensity.4
Reporter.intensity.5
Reporter.intensity.6
Reporter.intensity.7
Reporter.intensity.8
Reporter.intensity.9
Reporter.intensity.10
Reporter.intensity.11
Reporter.intensity.12
Reporter.intensity.13
Reporter.intensity.14
Reporter.intensity.15
Reporter.intensity.16
Reporter.intensity.17
Reporter.intensity.18

@edemmott
Copy link
Author

It looks like with example 1, the quantCols accessible with assay() are misassigned

@cvanderaa
Copy link
Member

Hi @edemmott,

Thanks for reporting the bug!

I don't know if something has already been discussed by mail with Laurent. Are you using the latest stable version of scp, ie scp >= 1.16.0?

If so, I would like to solve this bug as quickly as possible (actually in QFeatures). Would you have a small reproducible example, or at least the code you used to run your examples (I can mock data) ?

@edemmott
Copy link
Author

Will drop you/laurent an email with a link to the data/some of our .rmd and the documents.

Yes using 1.16. SessionInfo below:

R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] iSEEu_1.18.0 iSEEhex_1.8.0 imager_1.0.2 magrittr_2.0.3
[5] png_0.1-8 shiny_1.9.1 iSEE_2.18.0 bluster_1.16.0
[9] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.30.0 AnnotationFilter_1.30.0 GenomicFeatures_1.58.0
[13] AnnotationDbi_1.68.0 SCP.replication_0.2.1 data.table_1.16.2 scater_1.34.0
[17] scuttle_1.16.0 sva_3.54.0 BiocParallel_1.39.0 genefilter_1.88.0
[21] mgcv_1.9-1 nlme_3.1-166 patchwork_1.3.0 lubridate_1.9.3
[25] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
[29] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
[33] tidyverse_2.0.0 limma_3.62.1 scpdata_1.13.0 ExperimentHub_2.14.0
[37] AnnotationHub_3.14.0 BiocFileCache_2.14.0 dbplyr_2.5.0 SingleCellExperiment_1.28.1
[41] scp_1.16.0 QFeatures_1.16.0 MultiAssayExperiment_1.32.0 SummarizedExperiment_1.36.0
[45] Biobase_2.66.0 GenomicRanges_1.58.0 GenomeInfoDb_1.42.0 IRanges_2.40.0
[49] S4Vectors_0.44.0 BiocGenerics_0.52.0 MatrixGenerics_1.18.0 matrixStats_1.4.1
[53] Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4 QuantQC_0.1.0

@edemmott
Copy link
Author

Email sent to you both with examples and dataset.

@cvanderaa
Copy link
Member

cvanderaa commented Nov 28, 2024

I'm making progress, and I confirm that what you are seeing is a bug. Here is a minimal reproducible example:

library(scp)
ad <- matrix(1:75, ncol = 5, dimnames = list(NULL, rev(paste0("quantCol", 1:5))))
ad <- as.data.frame(ad)
ad$runCol <- rep(paste0("run", 1:3), each = 5)
cd <- data.frame(quantCols = rep(paste0("quantCol", 1:5), 3),
                 runCol = rep(paste0("run", 1:3), each = 5))
scp <- readSCP(assayData = ad,
               colData = cd,
               runCol = "runCol")
ad[ad$runCol == ad$runCol[[1]], ]
  quantCol5 quantCol4 quantCol3 quantCol2 quantCol1 runCol
1         1        16        31        46        61   run1
2         2        17        32        47        62   run1
3         3        18        33        48        63   run1
4         4        19        34        49        64   run1
5         5        20        35        50        65   run1
assay(scp, 1)
 run1_quantCol1 run1_quantCol2 run1_quantCol3 run1_quantCol4 run1_quantCol5
1              1             16             31             46             61
2              2             17             32             47             62
3              3             18             33             48             63
4              4             19             34             49             64
5              5             20             35             50             65

The second table has wrong column names and should match the first table.

This is indeed a QFeatures issue. After running the code in debug mode, here is the problematic line.

I'll try to work on it asap.

@cvanderaa
Copy link
Member

We are making progress, a preliminary fix is available here: https://github.com/cvanderaa/QFeatures. We still need some time to integrate this in QFeatures' official devel release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants