Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doLimma Feature Selection Does One-versus-All Comparisons #18

Open
DarioS opened this issue Jul 4, 2024 · 0 comments
Open

doLimma Feature Selection Does One-versus-All Comparisons #18

DarioS opened this issue Jul 4, 2024 · 0 comments

Comments

@DarioS
Copy link
Member

DarioS commented Jul 4, 2024

Each cell type is compared to all others. This could result in biased choices if one cell type dominates the composition.

for (i in seq_len(nlevels(cellTypes))) # For each cell type.
tmp_celltype <- (ifelse(cellTypes == levels(cellTypes)[i], 1, 0)) # One cell type versus the rest combined.

I propose that all pairs of cell types are compared and then averaged.

We deliberately use pairwise comparisons rather than comparing each cluster to the average of all other cells. The latter approach is sensitive to the population composition, which introduces an element of unpredictability to the marker sets due to variation in cell type abundances. In the worst case, the presence of one subpopulation containing a majority of the cells will drive the selection of top markers for every other cluster, pushing out useful genes that can distinguish between the smaller subpopulations.

Orchestrating Single Cell Analysis Chapter 6: Marker Gene Detection

The function also considers cell types but not samples: doLimma <- function(exprsMat, cellTypes, exprs_pct = 0.05). This also introduces a second level of bias. Suppose that sample A captured 3000 cells and sample B captured 5000 cells. The differences would be driven more by B.

scater's pseudoBulkDGE function already does this (Orchestrating Single Cell Analysis Chapter 4: Multi-sample Multi-condition Comparisons). But, it only allows integer count data and edgeR hypothesis testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant