-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect inference with scplainer #56
Comments
After exploring the statistical inference implementation, I realised that I wrongly interpreted the coefficients when factors are encoded as sum contrasts. Previously, when one of the groups is encoded as the "reference" (in fact, there is no longer a reference group with sum contrast, but it is encoded as -1 in all corresponding variables of the model matrix), I simply computed the logFC as 2x the coefficient associated to the second group to compare to. This approach only works when there are two groups to compare. Instead, and based on Lieven's suggestion (cf his course material), I now use a contrast matrix and where Updated figuresLet's explore with the new implementation using the same script as in the initial comment. The volcano plot becomes And the estimated logFC becomes
which is very close to the empirical value mentioned in the comment above. Important notes
This approach is implemented since 6b34084. |
I consider this issue as solved unless @lgatto you have additional comments/suggestions? |
Hello, I believe I am using the version (SHA1: 610cdff) in which this issue has been corrected. However, when I am using the scpDifferentialAnalysis() function, there seemed to be some mistakes in the result. When I manually compute the mean as mentioned above and compared that with the estimates generated from the scpDifferentialAnalysis() function, there is a discrepancy. In some extreme cases, I can see when a protein that should've had higher abundance in one group is shown to have higher abundance in the opposite group. I have four different groups in my data. I tried |
Hello @shimin-chen, Thanks again for reaching out. Could you please provide a minimal reproducible example so that I can use your example as a test case for debugging? For instance, you can provide a QFeatures object with one of the peptides for which you see problematic estimation as well as the code that runs the model and computes the mean. Meanwhile, here are a few comments:
|
Hi Christophe, Once again, I appreciate your response and continued support and development on this tool. I tried to come up with a QFeatures object with just the problematic peptides using To respond to your comments-
I could be wrong about this - I felt that the issue may have something to do with the contrast matrix resulting from having more than 2 groups in SampleType (I have 4 groups in total). Please let me know if I can provide more information for debugging. |
Again, thanks for pointing out a new issue! I opened an issue here #58 to solve this. Since I think it will take us some time to solve this, would it be ok for you to share your full data set? Depending on its size, you may need to share it through an external server like GDrive, Dropbox, OneDrive, etc. Then, could you also provide the minimal code you used to highlight your issue? Even if it is almost copy-pasted from my comment, it will avoid me guess work 😉 If your data is confidential and you are not allowed to share it, could you try to first subset |
I managed to reproduce the issue with a subsetted sce object to be used before running the
I get this result
Interestingly, when I reorder the factor level in SampleType, the calculation is correct.
I seem to be getting the correct result this time
If I compare the same group:
I get:
|
This is very helpful! I was able to reproduce your error and found a nasty little bug in the code that, as you have noticed, incorrectly assigned the levels when building contrasts. This is fixed since 4349072 So, many thanks again for pointing this out and for providing an example that facilitated debugging. My apologies for the inconvenience 🙏 |
Fantastic! Thank you so much for the prompt response and actions on this issue 😀 Hope you have a great weekend! |
While testing the scplainer's differential analysis approach on data with mock biological labels (i.e. I artificially created 2 groups within the same cell type), I noticed there is something wrong when the biological variable has more than 2 groups (e.g. melanoma, mock1, mock2).
Reproducible example
Load data
Assign mock labels to one of the cell types (monocytes). The mock label is randomly assigned within each MS batch.
Model and analyse with mock labels
The problem
While we expect no differential peptides, the volcano plot reports a few strongly differentially expressed peptides.
Let's explore the most differentially expressed peptide. After keeping only the biological effect (with mock), I manually compute the mean of each group:
The difference between the Mock1 and Mock2 groups is ~ -0.0025, but the computed logFC is
This is far from the expected value... (to be continued)
The text was updated successfully, but these errors were encountered: