Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use for pseudobulk differential expression - advantages & logFC values #22

Open
Al-Murphy opened this issue May 10, 2021 · 1 comment
Open

Comments

@Al-Murphy
Copy link

Hi,

Thank you for your very useful package. I have two questions regarding its use for pseudobulk differential expression analysis.

Firstly, could you outline the reasons why you think your model is better for pseudobulk than alternatives like a manual pseudobulk step and edgeR/DEseq, given that glmGamPoi's main use seems to be for non-pseudobulk?

Secondly, I have noted strange logFC values when performing pseudobulk differential expression analysis on a Alzheimer's Disease split by 6 cell types. The dataset has approx 50 samples, resulting in >50k cells after quality control. The logFC values can be seen in this volcano plot:

image

The logFC values for all cell types appears to be split into three groups and does not appear as I would expect in a volcano plot. Have you noted logFC values like this before? I have attached the table of this data for just cell type "A" (to keep the size down).
DE_analysis_odd_logFC_values.txt

@const-ae
Copy link
Owner

Hey Alan,

thanks for your kind words and please excuse the delay/

Firstly, could you outline the reasons why you think your model is better for pseudobulk than alternatives like a manual pseudobulk step and edgeR/DEseq, given that glmGamPoi's main use seems to be for non-pseudobulk?

Conceptually none, really. It is more of a convenience that you can use the same interface for pseudobulk and non-pseudobulk questions.

The logFC values for all cell types appears to be split into three groups and does not appear as I would expect in a volcano plot. Have you noted logFC values like this before?

Yes, I am aware of this pattern in volcano plots. The easy fix is to set all LFC above to let's say 15 to Inf. You will then see the familiar volcano pattern from the center column.

The underlying issue is that the LFC for the columns on the left and right come from comparisons where all counts are 0 in one group. Technically, the LFC, in this case, is infinity, however, due to convergence reasons, the algorithm returns a large LFC of around 20. I have considered introducing a threshold that sets large LFC values automatically to infinity but have so far shied away from it because I worry that this will only create more confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants