Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CADD score normalization #566

Open
icooperstein opened this issue Aug 27, 2024 · 2 comments
Open

CADD score normalization #566

icooperstein opened this issue Aug 27, 2024 · 2 comments

Comments

@icooperstein
Copy link

I am wondering if you can explain/look into how CADD scores are normalized to have a scale from 0-1 when used as a pathogenicity predictor. I have noticed that when I combine CADD with other tools, CADD is the max path score for almost every variant, regardless of variant type. I've also noticed when I inspect the output files, these CADD scores are often >0.97.

One example:
One variant in my result had the following scores:
CADD=0.97435516,REVEL=0.031,MVP=0.19391714,ALPHA_MISSENSE=0.0944
I looked up this variant in CADD, and its CADD Phred score is 15.

However, another variant had a Exomiser CADD=0.99748814 in the output, but a CADD Phred score of 26. Since Phred scores are logarithmic, the difference between 15 and 26 is much more drastic than I am seeing with these scaled scores in the output.

We would really like to use CADD scores, especially since many other predictors are for missense variants only.

@julesjacobsen
Copy link
Contributor

The CADD normalisation is here:

/**
* Creates a {@link CaddScore} from the input PHRED scaled score. *IMPORTANT* this method will rescale the input
* PHRED score to a score in the 0-1 range, therefore ensure the correct CADD score is used here.
*
* According to https://cadd.gs.washington.edu/info a good cutoff to use is the PHRED scaled scores of
* 10-20 which equates to 90-99% most deleterious or 13-20 (95-99%). For reference, these are scaled to 0.90 - 0.99.
*
* The M-CAP authors (http://bejerano.stanford.edu/mcap/) suggest these cutoffs are too permissive, although their
* recommended thresholds don't appear to match what was actually suggested by the CADD authors.
*/
public static CaddScore of(float phredScaledScore) {
float score = 1 - (float) Math.pow(10, -(phredScaledScore / 10));
return new CaddScore(phredScaledScore, score);
}

@AlistairNWard
Copy link

Thanks for posting this @julesjacobsen. This makes some sense, but I'm not sure that it results in the desired behaviour. The CADD scaling will need to, as far as is reasonable, put the CADD scores on the same scale as the other pathogenicity scores. For example, a 0.9 for CADD should be largely equivalent to a 0.9 for REVEL. If this is not the case, then one pathogenicity source will outweigh all the others - which is what we see. If we include CADD, then it generally scores very high and will almost always be selected over other sources. I had a quick look at a number of variants with a REVEL score of ~0.9 and they all had corresponding CADD scores of ~30. By the scoring method above, a CADD score of 10 would be scaled to 0.9 and so it is not surprising that CADD heavily dominates. I think this scaling would benefit from a rethink. Is this something that we should discuss?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants