n/p ratio clarification #64

samgregoire · 2024-06-17T21:00:43Z

I made a scpModelWorkflow() modeling of a small SingleCellExperiment object (I only have 20 cells).
The scpModelFilterPlot() looks like this:

I'm not surprised that I only have a few estimated features as I only have a few cells/observations. However, I'm puzzled by two things:

Why is the bar carresponding to features with a n/p ratio of 1 colored as "inestimable" ? According to the legend (and what I checked), features with a n/p ratio >= 1 are considered to be estimated.
How can I have features with a n/p ratio of 0?
I thought that n could never be equal to 0 and checked that this was the case.

summary(sapply(metadata(sce)$model@scpModelFitList, "slot", "n"))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   3.000   3.695   5.000  21.000

Indeed, n/p ratio is never less than 0.5

np <- 
  sapply(metadata(sce)$model@scpModelFitList, "slot", "n") /
  sapply(metadata(sce)$model@scpModelFitList, "slot", "p")
summary(np)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.5000  0.6667  1.1250     Inf     Inf     Inf

However, I was surpised to see that a large number of the n/p ratios were infinite, which means that p is equal 0.

summary(sapply(metadata(sce)$model@scpModelFitList, "slot", "p"))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   5.000   3.905   7.000  14.000

On further investigation, I found out that this happens whenever there are only 1 or 2 observations for a specific feature

p0 <- which(sapply(metadata(sce)$model@scpModelFitList, "slot", "p") == 0)
 
nobs_p0 <- rep(NA, length(p0))
 
for(i in seq_along(p0)) {
     nobs_p0[i] <- nrow(colData(sce)[!is.na(assay(sce)[p0[i], ]), ])
}
 
nobs <- rowSums(!is.na(assay(sce)))
obs_2 <- which(nobs <= 2)
table(obs_2 == p0)

TRUE 
3409

I assume that the 3409 features with an infinite n/p ratio are plotted as 0 in the plot.
Why do the features with 2 observations always have a p equal 0? I suppose it's not that important since features with only 2 observations are not very informative in bigger datasets.

The text was updated successfully, but these errors were encountered:

cvanderaa · 2024-07-09T07:18:44Z

Hi Sam,
Thanks for pointing out these inconsistencies.

Regarding your first point, I will fix this. The legend and docs are right, but the plot is misleading. It has to do with a wrong assignment of the edge cases when I cut the histograms into estimable and non-estimable features.

Regarding your second point, you did a great investigation job! Indeed, the issue you are raising lies within these lines:

scp/R/ScpModel-Workflow.R

Lines 213 to 217 in 5e094c6

    
           if (nrow(coldata) <= 2) { 
        
               out <- matrix(nrow = nrow(coldata), ncol = 0) 
        
               attr(out, "levels") <- List() 
        
               return(out) 
        
           }

I intentionally did this, as IMHO, there is no use to model data with only 2 or less data points. Hence I generate an empty model matrix, hence p = 0, hence the feature is ignored. I'm open for discussion whether this would need a more clever management.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n/p ratio clarification #64

n/p ratio clarification #64

samgregoire commented Jun 17, 2024 •

edited

Loading

cvanderaa commented Jul 9, 2024

n/p ratio clarification #64

n/p ratio clarification #64

Comments

samgregoire commented Jun 17, 2024 • edited Loading

cvanderaa commented Jul 9, 2024

samgregoire commented Jun 17, 2024 •

edited

Loading