doLimma Feature Selection Does One-versus-All Comparisons #18

DarioS · 2024-07-04T02:00:50Z

Each cell type is compared to all others. This could result in biased choices if one cell type dominates the composition.

for (i in seq_len(nlevels(cellTypes))) # For each cell type.
tmp_celltype <- (ifelse(cellTypes == levels(cellTypes)[i], 1, 0)) # One cell type versus the rest combined.

I propose that all pairs of cell types are compared and then averaged.

We deliberately use pairwise comparisons rather than comparing each cluster to the average of all other cells. The latter approach is sensitive to the population composition, which introduces an element of unpredictability to the marker sets due to variation in cell type abundances. In the worst case, the presence of one subpopulation containing a majority of the cells will drive the selection of top markers for every other cluster, pushing out useful genes that can distinguish between the smaller subpopulations.

Orchestrating Single Cell Analysis Chapter 6: Marker Gene Detection

The function also considers cell types but not samples: doLimma <- function(exprsMat, cellTypes, exprs_pct = 0.05). This also introduces a second level of bias. Suppose that sample A captured 3000 cells and sample B captured 5000 cells. The differences would be driven more by B.

scater's pseudoBulkDGE function already does this (Orchestrating Single Cell Analysis Chapter 4: Multi-sample Multi-condition Comparisons). But, it only allows integer count data and edgeR hypothesis testing.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doLimma Feature Selection Does One-versus-All Comparisons #18

doLimma Feature Selection Does One-versus-All Comparisons #18

DarioS commented Jul 4, 2024

doLimma Feature Selection Does One-versus-All Comparisons #18

doLimma Feature Selection Does One-versus-All Comparisons #18

Comments

DarioS commented Jul 4, 2024