-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intuition for noise parameter #237
Comments
A short explanation of delta is that it is the over-dispersion, that is the amount of variation observed above Poisson noise for a given cell+gene combination.1 In BASiCS, cells are modelled as being distributed according to a negative binomial distribution. This is pretty much a super-Poisson model, with the level of "super-Poisson-ness" being controlled by the over-dispersion parameter delta. That is, we model counts If we ignore the cell-specific factor I hope this is sufficient intuition for the interpretation of delta. As for how to account for mean dependence, can you maybe explain in more detail why the global change in mean makes a regression approach unsuitable? If you estimate the mean/variance trend independently in the two groups you are investigating, then a global shift shouldn't be too much of a problem, although interpretation may be tricky if you have very large-scale shifts in expression patterns. In general most approaches that remove mean dependence from measures of variability tend to be at least somewhat similar in their approach, usually involving some sort of regression model as ratios of variance/SD to mean are generally still highly correlated with mean in sequencing data (and single cell in particular). Footnotes
|
We treat cells with IdU, causing global increase in noise without altering mean expression levels. See our recent paper. Regarding mean dependence, we see that delta is negatively correlated with the mean (mu). We are interested in measuring noise amplification in a mean-independent manner, and looking for a metric that would resemble Fano, as below (from this paper): |
Hi, |
Hi, sorry for the delay getting back to you. To follow up on the mean dependence issue, you can see below that CV2 and over-dispersion are similar measures, and in (sc)RNAseq data Fano isn't typically mean-independent. That's why in general1 scRNAseq pipelines for selecting HVGs involve a step where a variability measure (Fano/CV2/variance/overdispersion) is regressed against mean. The question of whether and why mean and overdispersion are negatively correlated in scRNAseq is a complicated one - there are papers in favour of the idea2 and against it3. I don't feel qualified to provide a final answer in this thread unfortunately. I'm not really familiar with smFISH data to be honest. Aspects of the data-generating process are different so perhaps the same doesn't hold there, though I don't think that really impacts how scRNAseq data should be treated. library("scRNAseq")
library("matrixStats")
library("glmGamPoi")
data <- ZeiselBrainData()
## restrict to a cell type just to avoid composition effects
data <- data[, data$level1class == "pyramidal CA1"]
## arbitrary cutoff just to thin the data a bit
data <- data[rowMeans(counts(data)) > 1, ]
mean <- rowMeans(counts(data))
var <- rowVars(counts(data))
cv2 <- var / (mean^2)
fano <- var / mean
plot(mean, var, log = "xy") plot(mean, cv2, log = "xy") plot(mean, fano, log = "xy") ## estimate gamma-poisson glm to get overdispersion estimates
## these will be different to the BASiCS estimates but a good stand-in
fit <- glm_gp(data)
plot(fit$overdispersions, cv2, log = "xy") plot(fit$overdispersions, var, log = "xy") plot(fit$overdispersions, fano, log = "xy") Footnotes |
Hi @binyaminZ, just a couple of things to add to Alan's response:
where If there is a global increase in variability, I imagine this would be captured by |
Hi,
Thanks for this useful tool! I have done some extensive analysis and found BASiCS to very efficiently remove several types of extrinsic noise, validated by at least three independent datasets.
However, I am not able to follow the math of your algorithm, so I am having a hard time parsing the statistical meaning of delta. I do understand that the mean parameter, mu, would be the equivalent of TPM or something like this, but how would you describe delta? It seems to behave like CV or maybe like CV^2, but I'm not sure what is the mathematical difference between delta and CV.
I will explain our motivation: We are interested in comparing super-Poissonian behavior between samples and looking to convert the delta values to something that is independent of the mean, like the Fano factor (which is CV^2 / mean). Would it make sense to just divide delta by mu? Or how would you address this generally? The regression solution is not good for us, because our treatment causes a global change in noise, and the regression approach cancels out this effect.
Best,
Binyamin
The text was updated successfully, but these errors were encountered: