Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deriving beta-adjusted p-values from nominal p-value and shape parameters #20

Open
kwcurrin opened this issue Mar 17, 2022 · 4 comments

Comments

@kwcurrin
Copy link

No description provided.

@kwcurrin
Copy link
Author

Hello,

Is it possible to obtain the beta-approximated p-values using the nominal p-value and the two shape parameters using the pbeta function in R? I tried this for my data, but the p-values don't match. For example:
pbeta(0.00702837, 0.976548, 44.2766)
gives a value of 0.2788137, but the beta-adjusted p-value is listed as 0.342138. Do I need to set something different in the R function to get it to match?

I noticed this issue because when I tried calculating all significant gene-variant pairs using the pval_beta threshold corresponding to FDR<5% and calculating nominal p-value thresholds per gene with qbeta, one gene dropped off because its best nominal p-value threshold was above the nominal p-value threshold derived using qbeta. I think this could be related to the difference in beta-adjusted p-values described above.

Do you have any suggestions for adjusting the pbeta or qbeta function parameters to get the results to match?

Thanks,

Kevin

@kwcurrin
Copy link
Author

Hello,

A colleague found out this issue results from different df values used in calculation of pval_beta and pval_nominal in analysisPermutation.cpp. The pval_nominal reported in the fastQTL output is calculated using "df" (sample size - 2 - number of covariates), whereas when pval_beta is calculated, the pval_nominal is initially calculated using true_df, which is a learned value, and this is then converted to pval_beta. Both of us confirmed that if we calculate a nominal p-value from the reported Pearson correlation using true_df instead of df and then use this updated pval_nominal in pbeta(pval_nominal,shape1,shape2), the value we get for pval_beta matches pval_beta reported in the fastQTL output.

This difference in df values used results in an issue when calculating all significant variant-gene pairs using qbeta to get the nominal threshold per gene. Although true_df is often smaller than df, sometimes true_df is sufficiently larger than df such that the nominal p-value threshold calculated using qbeta will be smaller than the best variant for that gene, even though the gene met the FDR<5% threshold using pval_beta calculated from true_df. In my case, this only happened for one gene.

Another potential issue is that when true_df is much smaller than df, some variants that don't actually meet the gene-level nominal threshold will be classified as significant because the reported pval_nominal is more significant because it was calculated with df. If true_df is larger, less variants will pass the nominal threshold.

I also noticed that sometimes true_df is equal to or slightly higher than sample size. This is rare, but still happens.

Would you be able to advise on whether it is better to use df or true_df?

Thanks,

Kevin

@francois-a
Copy link
Owner

That's correct, you need to use true_df for this calculation (see here). But the reason for a gene missing from the significant pairs list is likely due to the calculation of the threshold, discussed here.

@kwcurrin
Copy link
Author

Hi Francois,

Thank you for your response. It appears that annotate_outputs.py compares the nominal p-values (calculated with df) to the nominal p-value threshold (calculated with true_df). Is this correct?

Would you be able to point me to a description of what "true_df" means conceptually? I looked through the c++ code for learnDF, but was unable to understand exactly what the function is doing. What is the benefit of using true_df vs. df?

Thanks!

Kevin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants