Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the REVEL score provided from myvariant.info #348

Closed
cgpreston opened this issue Nov 10, 2023 · 24 comments
Closed

Issue with the REVEL score provided from myvariant.info #348

cgpreston opened this issue Nov 10, 2023 · 24 comments
Assignees
Labels

Comments

@cgpreston
Copy link

cgpreston commented Nov 10, 2023

REPLACED BY #350 AS OF 11-21-23


Brought up by a curator for variant CA415086302. The curator reported seeing the REVEL score of 0.653 in the VCI in April 2023, however in November they saw the REVEL score had changed to 0.173.

Investigating I see the following:

  1. The UCSC genome browser shows REVEL score of 0.653 https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chrX%3A153693943%2D153693945&hgsid=1766476736_3J9cca847flVLCJJ0hP5JvWuGOxZ
  2. Myvariant.info shows REVEL score of 0.173 https://myvariant.info/v1/variant/chrX:g.153693944C%3EA?assembly=hg38&format=html

Looking at REVEL scores directly from their website for that nucleotide, which I'm copying here: it looks to me that the REVEL score data being pulled in from myvariant.info is using the ENST00000457723.1:c.165C>A change, while what UCSC is pulling is from the MANE select NM_005629.4:c.1181C>A.

REVEL data (* is MANE)

chr hg19_pos grch38_pos ref alt aaref aaalt REVEL Ensembl_transcriptid
X 152959399 153693944 C A H Q 0.173 ENST00000457723
X 152959399 153693944 C A R S 0.109 ENST00000442457
X 152959399 153693944 C A T K 0.653 ENST00000253122*;ENST00000430077;ENST00000328897

Investigating other variants it appears we get only 1 REVEL score from myvariant.info/dbnsfp, even when there are multiple scores:

CA415087428
VCI shows: 0.093 REVEL score, as does myvariant.info
REVEL data (* is MANE)

chr hg19_pos grch38_pos ref alt aaref aaalt REVEL Ensembl_transcriptid
X 152936043 153670588 A T F I 0.093 ENST00000340888*;ENST00000370150;ENST00000393831;ENST00000370142;ENST00000370145;ENST00000447676
X 152936043 153670588 A T L H 0.206 ENST00000438984

CA415067047
VCI shows: 0.018 REVEL score, as does myvariant.info
REVEL data:

chr hg19_pos grch38_pos ref alt aaref aaalt REVEL Ensembl_transcriptid
X 152128437 152959893 G C A P 0.052 ENST00000454925
X 152128437 152959893 G C Q H 0.013 ENST00000426821
X 152128437 152959893 G C S T 0.018 ENST00000535861;ENST00000539731;ENST00000449285;ENST00000318504;ENST00000535156;ENST00000324823;ENST00000370268;ENST00000318529;ENST00000370270

CA415083325
VCI shows: 0.403 REVEL score, as does myvariant.info
REVEL data (* is MANE)

chr hg19_pos grch38_pos ref alt aaref aaalt REVEL Ensembl_transcriptid
X 152958524 153693069 A C T P 0.403 ENST00000413787
X 152958524 153693069 A C Y S 0.873 ENST00000253122*;ENST00000430077;ENST00000328897

CA415083258
VCI shows 0.389 REVEL score, as does myvariant.info
REVEL data (* is MANE)

chr hg19_pos grch38_pos ref alt aaref aaalt REVEL Ensembl_transcriptid
X 152958515 153693060 C A H N 0.389 ENST00000413787
X 152958515 153693060 C A T K 0.904 ENST00000253122*;ENST00000430077;ENST00000328897
@cgpreston cgpreston added the VCI label Nov 10, 2023
@cgpreston cgpreston self-assigned this Nov 10, 2023
@cgpreston cgpreston changed the title REVEL scores from myvariant.info not on the MANE transcript change REVEL scores from myvariant.info are singular and not necessarily the one we want Nov 10, 2023
@cgpreston
Copy link
Author

cgpreston commented Nov 10, 2023

Going forward we probably want to bring in all scores and display the corresponding transcript details

Additional information from dbNSFP:

NEW VERSION (February 18, 2022): dbNSFP v4.3 is released. REVEL scores have been updated with transcript ids, i.e., the scores are now transcript-specific.

dbNSFP does return the different REVEL scores for variants with multiple scores, I suspect the issue is with myvariant.info, the JSON output supplied by myvariant.info for dbnsfp shows a varying number of revel score entries per variant (I've encountered between 1 and 8), however all of the values are identical (see screenshot below)

Screen Shot 2023-11-10 at 3 51 38 PM

@cgpreston
Copy link
Author

Emailed myvariant.info about this on 11/10/23.

@cgpreston cgpreston changed the title REVEL scores from myvariant.info are singular and not necessarily the one we want Issue with the REVEL score provided from myvariant.info Nov 12, 2023
@liammulh
Copy link
Member

We would like to have at least a band-aid solution in production by
Monday the 20th of November. In order to provide a pull request reviewer
two business days to review, the pull request needs to be in by
Wednesday at the end of the day.

@liammulh
Copy link
Member

liammulh commented Nov 13, 2023

For my own clarification, I'm going to summarize the issue.

From what I understand, REVEL scores range from 0 to 1. A 0 score means
not pathogenic. A 1 score means pathogenic. Accurate REVEL scores are
crucial for variant curation.

For the CA415086302 variant, the REVEL score we display in the VCI is
0.173. However, Christine points out that "the UCSC genome browser shows
a REVEL score of 0.653." From what I gather, this 0.653 score could be
considered pathogenic, whereas the 0.173 score is less significant.
Thus, the curation could be wrong based on this information.

The problem is that the website we get our REVEL data from —
myvariant.info — gives us one REVEL score for
the CA415086302 variant:

{
  "_id": "chrX:g.153693944C>A",
  "_version": 1,
  "clingen": {
    "caid": "CA415086302"
  },
  "dbnsfp": {
    "_license": "http://bit.ly/2VLnQBz",
    [...]
    "revel": {
      "rankscore": 0.4384,
      "score": 0.173
    },
    [...]
    }
  [...]
}

The score they provide in the revel object (0.173) is different from
the score in the UCSC genome browser (0.653). The discrepancy exists
because myvariant.info is using the ENST00000457723 transcript ID
whereas the UCSC genome browser is using the ENST00000253122
transcript ID.

We need to display accurate information. To display accurate
information, we need to display the score and the transcript ID that
corresponds to the score.

@liammulh
Copy link
Member

The high-level solution to our problem is to display scores and the
corresponding transcript IDs. There are at least a few different ways we
could do that.

Get the scores and transcript IDs from myvariant.info

Suppose the variant we're interested in is CA415087428. The object we
get from myvariant.info contains REVEL scores in the
dbnsfp > revel > score path:

    [...]
    "revel": {
      "rankscore": 0.26882,
      "score": [
        0.093,
        0.093,
        0.093,
        0.093,
        0.093
      ]
    },
    [...]

The object we get from myvariant.info also contains transcript IDs for
CA415087428 in the dbnsfp > ensembl > transcriptid path:

    [...]
    "ensembl": {
      "geneid": [
        "ENSG00000130822",
        "ENSG00000130822",
        "ENSG00000130822",
        "ENSG00000130822",
        "ENSG00000130822"
      ],
      "proteinid": [
        "ENSP00000340586",
        "ENSP00000359169",
        "ENSP00000359161",
        "ENSP00000359164",
        "ENSP00000405950"
      ],
      "transcriptid": [
        "ENST00000340888",
        "ENST00000370150",
        "ENST00000370142",
        "ENST00000370145",
        "ENST00000447676"
      ]
    },
    [...]

We could display the REVEL score along with its (presumably)
corresponding transcript ID in the VCI. Some variants like CA415086302
only have one REVEL score and one transcript ID. In this case, the REVEL
score field and the transcript ID field are string instead of arrays of
strings.

Pros

  • Keeps us within the paradigm of getting data from third-party APIs on
    the fly.
  • Doesn't introduce another third-party API we have to rely on.

Cons

  • The REVEL scores and the transcript IDs aren't explicitly linked.
  • The set of REVEL scores isn't complete for all variants, e.g.
    CA415086302.

Get the scores and transcript IDs from some other API

In this case, we would be getting the data from some other API (UCSC's
API
?).

Pros

  • Keeps us within the paradigm of getting data from third-party APIs on
    the fly.
  • Potentially solves the problem of not having the relevant REVEL score
    to display.

Cons

  • The REVEL scores and the transcript IDs might not be explicitly linked.
  • Introduces another third-party API we have to rely on.

Get the scores and transcript IDs from the source of truth (REVEL CSV download)

This solution would entail downloading and parsing the CSV from the
source of truth: https://sites.google.com/site/revelgenomics/downloads.
We would have to persist the data somewhere and provide a way for the
front end to query the data.

Pros

  • Accurate data.
  • Able to map REVEL scores to transcript IDs.

Cons

  • Does not keep us within our usual paradigm of getting data from
    third-party APIs on the fly.
  • Requires us to persist the data.
  • Requires us to provide an API for the front end to consume.

@liammulh
Copy link
Member

It might be a good idea to tabulate the pros and cons.

@liammulh
Copy link
Member

We could open an issue in the myvariant.info repo: https://github.com/biothings/myvariant.info. Maybe we could even clone the repo, run their API locally, and use git bisect to find the commit(s) where the score for CA415086302 went from 0.653 to 0.173. If we can figure out the problem we could contribute a bug fix.

@liammulh
Copy link
Member

It could be a problem with dbNSFP. According to this page, a new version of their database was released in May:

NEW VERSION (May 6, 2023): dbNSFP v4.4 is released. [...]

@liammulh
Copy link
Member

liammulh commented Nov 14, 2023

I tabulated the pros and cons to the solutions I listed above in this Google doc. Here's the table:

pros-and-cons

@liammulh
Copy link
Member

I talked to Matt this morning, and we're going ahead with the band-aid solution of including the parenthesized transcript ID that we get from myvariant.info next to the REVEL score. Once this band-aid has been applied, we will determine a better long-term solution.

@liammulh
Copy link
Member

liammulh commented Nov 14, 2023

To view the problematic REVEL score for variant CA415086302, npm start, and click the new "New Variant Curation" button, search for CA415086302. There should be a result. Click "View Evidence", and then click the "Variant Type" tab. You should see something like:

problematic-score

The idea behind the band-aid fix is that instead of seeing just the score "0.173" we would see the score followed by the corresponding transcript ID "0.173 (ENST00000457723)".

@liammulh
Copy link
Member

liammulh commented Nov 14, 2023

I believe the component that renders the REVEL score is the ClinGenPredictorsTable component.

The ClinGenPredictorsTable component takes four parameters. One of them is clinGenPred, which is an object that looks like:

{
    "revel": {
        "score_range": "0 to 1",
        "score": null,
        "prediction": "higher score = higher pathogenicity",
        "visible": true
    },
    "cftr": {
        "score_range": "0 to 1",
        "score": null,
        "prediction": "higher score = higher pathogenicity",
        "visible": false
    }
}

@liammulh
Copy link
Member

liammulh commented Nov 15, 2023

The ClinGenPredictorsTable component is used in the VariantType component:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L506-L511

Unfortunately, GitHub isn't rendering the code, so I'll paste it:

            <ClinGenPredictorsTable
              clinGenPred={clinGenPred}
              clinGenPredStatic={clinGenPredStatic}
              isSingleNucleotide={isSingleNucleotide}
              isLoadingMyVariantInfo={isLoadingMyVariantInfo}
            />

@liammulh
Copy link
Member

liammulh commented Nov 15, 2023

The clinGenPred object is also defined in the VariantType component:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L446

Unfortunately, GitHub isn't rendering the code, so I'll paste it:

const clinGenPred = computationalObj && computationalObj.clingen;

@liammulh
Copy link
Member

liammulh commented Nov 15, 2023

The computationalObj variable is stateful:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L403-L410

Unfortunately, GitHub isn't rendering the code, so I'll paste it:

  render() {
    const {
      computationalObj,
      esearchData,
      isLoadingEsearch,
      compDataDiffFlag,
      saveAlert,
    } = this.state;

@liammulh
Copy link
Member

liammulh commented Nov 15, 2023

I think I have a rough idea of what we're doing with the computationalObj variable.

It's initialized as a piece of state in the VariantType component:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L41

class VariantType extends Component {
  constructor(props) {
    super(props);
    this.state = {
      tabIndex: props.tabIndex, // Tab index received from CodeStrip click
      computationalObj: initialComputationalData,
      esearchData: {},
      isLoadingEsearch: false,
      compDataDiffFlag: false,
      hasOtherPredData: false,
      hasConservationData: false,
    }

Then we "build" the computationalObj and compare it to the evaluations prop:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L59

  componentDidUpdate(prevProps) {
    if (this.props.myVariantInfoData !== prevProps.myVariantInfoData) {
      const { myVariantInfoData } = this.props;
      if (myVariantInfoData) {
        const computationalObj = this.buildComputationalObj(myVariantInfoData);
        this.setState({ computationalObj });
      }
    }
    if (this.props.evaluations && this.props.evaluations !== prevProps.evaluations && this.props.evaluations.BP1.computational !== prevProps.evaluations.BP1.computational) {
      this.compareComputationalData(this.state.computationalObj, this.props.evaluations).catch(AmplifyAPIRequestRecycler.defaultCatch);
    }

The compareComputationalData method has a side effect where a flag is set.

Then in the buildComputationalObj method, we call several other methods on the computationalObj:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L130-L135

  buildComputationalObj = (myVariantInfoData) => {
    const objWithOtherPred = this.parseOtherPredData(myVariantInfoData);
    const objWithConservation = this.parseConservationData(myVariantInfoData, objWithOtherPred);
    const completeComputationalObj = this.parseClingenPredData(myVariantInfoData, objWithConservation);
    return completeComputationalObj;
  }

The parseClingenPredData method seems to be adding the REVEL score to the computationalObj variable:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L229-L235

  /**
     * Method to assign clingen predictors data to global computation object
     * REVEL data is now parsed from myvariant.info response
     * It can be accessed via response['dbnsfp']['revel'] or using
     * the 'parseKeyValue()' helper function which traverse the tree down to 2nd level
     * 
     * TBD on where the CFTR data is queried from after Bustamante lab is no longer the source
     * And thus the CFTR data parsing in this method needs to be altered in the future
     * 
     * @param {object} myVariantInfoData - The response object returned by myvariant.info
     */
  parseClingenPredData = (myVariantInfoData, compObj) => {
    const computationalObj = cloneDeep(compObj);
    const revel = parseKeyValue(myVariantInfoData, 'revel'),
      cftr = parseKeyValue(myVariantInfoData, 'cftr');
    if (revel) {
      computationalObj.clingen.revel.score = (revel.score) && this.numToString(revel.score);
    }

I'm not sure where the score_range, prediction, and visible fields are being set.

@liammulh
Copy link
Member

liammulh commented Nov 15, 2023

I stuck a debugger before line 59 in the VariantType component:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L59

  componentDidUpdate(prevProps) {
    if (this.props.myVariantInfoData !== prevProps.myVariantInfoData) {
      const { myVariantInfoData } = this.props;
      if (myVariantInfoData) {
        const computationalObj = this.buildComputationalObj(myVariantInfoData); // <-- line 59
        this.setState({ computationalObj });
      }
    }

I couldn't find the score range in the myVariantInfoData object. (I copied the object to my clipboard, pasted it into a file, and then searched for score and 0 to 1. I didn't see anything that might be a score range. There were no results for prediction, higher, or visible.) Anyway, after stepping over line 59, the computationObj variable looked like this:

{
    "conservation": {[...]},
    "other_predictors": {[...]}
    "clingen": {
        "revel": {
            "score_range": "0 to 1",
            "score": "0.173",
            "prediction": "higher score = higher pathogenicity",
            "visible": true
        },
        "cftr": {
            "score_range": "0 to 1",
            "score": null,
            "prediction": "higher score = higher pathogenicity",
            "visible": false
        }
    }
}

@liammulh
Copy link
Member

In the buildComputationalObj method, we have the following:

  buildComputationalObj = (myVariantInfoData) => {
    const objWithOtherPred = this.parseOtherPredData(myVariantInfoData);
    const objWithConservation = this.parseConservationData(myVariantInfoData, objWithOtherPred);
    const completeComputationalObj = this.parseClingenPredData(myVariantInfoData, objWithConservation);
    return completeComputationalObj;
  }
  • The myVariantInfoData object contains the REVEL score.
  • The object that this function returns contains score_range, prediction, and visible fields.

@liammulh
Copy link
Member

liammulh commented Nov 15, 2023

I stuck a debugger before line 131:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L131

  buildComputationalObj = (myVariantInfoData) => {
    const objWithOtherPred = this.parseOtherPredData(myVariantInfoData); // <-- line 131
    const objWithConservation = this.parseConservationData(myVariantInfoData, objWithOtherPred);
    const completeComputationalObj = this.parseClingenPredData(myVariantInfoData, objWithConservation);
    return completeComputationalObj;
  }

When I stepped over line 131 and logged objWithOtherPred, it looked like:

{
    "conservation": {[...]},
    "other_predictors": {[...]}
    "clingen": {
        "revel": {
            "score_range": "0 to 1",
            "score": null,
            "prediction": "higher score = higher pathogenicity",
            "visible": true
        },
        "cftr": {
            "score_range": "0 to 1",
            "score": null,
            "prediction": "higher score = higher pathogenicity",
            "visible": false
        }
    }
}

@liammulh
Copy link
Member

liammulh commented Nov 15, 2023

I was barking up the wrong tree. I should have realized that the values for score_range, prediction, and visible fields are pre-determined constants, and I should have searched the code base for their values. If I had done that, I would have found where they are coming from sooner. They are defined in the computational helper module:

https://github.com/ClinGen/gci-vci-aws/blob/d00c19fdba7cb71ba0b6ad0b6b4f74dd3aee48a1/gci-vci-react/src/components/variant-central/helpers/computational.js

We import initialComputationalData from the computational helper module in the VariantType component, and then we use it to initialize the computationObj variable:

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L41

class VariantType extends Component {
  constructor(props) {
    super(props);
    this.state = {
      tabIndex: props.tabIndex, // Tab index received from CodeStrip click
      computationalObj: initialComputationalData,
      esearchData: {},
      isLoadingEsearch: false,
      compDataDiffFlag: false,
      hasOtherPredData: false,
      hasConservationData: false,
    }

@liammulh
Copy link
Member

liammulh commented Nov 15, 2023

At this point, I think we have two goals:

  1. We need to get the myVariantInfoData.dbnsfp.ensembl.transcriptid to associate with the myVariantInfoData.dbnsfp.revel.score. This should be straightforward, but see the second goal.
  2. Based on Issue with the REVEL score provided from myvariant.info  #348 (comment) and Issue with the REVEL score provided from myvariant.info  #348 (comment), I think it's possible for myVariantInfoData.dbnsfp.ensembl.transcriptid to be either a string or an array of strings, and I think it's possible for myVariantInfoData.dbnsfp.revel.score to be either a string or an array of strings. We need to figure out if we expect multiple scores or a single score.

@liammulh
Copy link
Member

Based on my reading of the parseClingenPredData method, I think we are only expecting a single score.

https://github.com/ClinGen/gci-vci-aws/blob/6c106835b6a594f59e1bb66513143d34bde7c310/gci-vci-react/src/components/variant-central/tab-content/VariantType.js#L229-L235

  parseClingenPredData = (myVariantInfoData, compObj) => {
    const computationalObj = cloneDeep(compObj);
    const revel = parseKeyValue(myVariantInfoData, 'revel'),
      cftr = parseKeyValue(myVariantInfoData, 'cftr');
    if (revel) {
      computationalObj.clingen.revel.score = (revel.score) && this.numToString(revel.score);
    }

@liammulh
Copy link
Member

Per the Slack conversation posted in https://github.com/ClinGen/gci-vci-aws/pull/1379#issuecomment-1816900833, the new band-aid fix is to just show a warning next to the REVEL score instead of mapping Ensembl transcript IDs to REVEL scores.

@cgpreston
Copy link
Author

closing this as we've released the warning, will link this to the future tickets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants