Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update REVEL scores in VCI #350

Open
cgpreston opened this issue Nov 21, 2023 · 8 comments
Open

Update REVEL scores in VCI #350

cgpreston opened this issue Nov 21, 2023 · 8 comments
Assignees
Labels

Comments

@cgpreston
Copy link

cgpreston commented Nov 21, 2023

There is an issue with the REVEL scores in the VCI (see #348 for details, as of Nov 2023 we display a warning for curators).

Going forward we should attempt the following:

  1. Work on a REVEL data plan for the LDH team

Mockups: https://docs.google.com/presentation/d/1qYZzqlJmdWvsRrYILwnJw4u1zbKFtRgIeZ6oWPox-fY/edit#slide=id.g2dd3f09f5b8_0_0

We also need to update the footer with the provenance info - I've reached out to Neethu about the language.

SP ticket: https://broadinstitute.atlassian.net/browse/CGSP-654

@liammulh
Copy link
Member

liammulh commented Dec 5, 2023

@cgpreston, there doesn't seem to be a simple way to get the REVEL score data from the Ensembl VEP API. We are going to ask Baylor if they can get the data for us, or if they can give me access so I can write an API endpoint into the LDH.

@liammulh
Copy link
Member

liammulh commented Dec 5, 2023

We discussed this in team meeting. I am going to investigate myvariant.info's dbNSFP parser code and update biothings/myvariant.info#179.

@liammulh
Copy link
Member

liammulh commented Dec 6, 2023

I spent some time this afternoon investigating this myvariant.info's dbNSFP parser code.

I forked myvariant.info's GitHub repo, and based on src/hub/dataload/sources/dbnsfp/dbnsfp_43a.py I concluded they are using version 4.3a of dbNSFP. (There is a newer version of dbNSFP. It was released on November 5th.) I read the code that parses REVEL score data. Nothing seemed obviously wrong, so I downloaded dbNSFP to my computer. I wanted to figure out if their parsing code was written correctly.

Here are the decompressed contents of the dbNSFP archive:

.
├── LICENSE.txt
├── dbNSFP4.3_gene.complete.gz
├── dbNSFP4.3_gene.gz
├── dbNSFP4.3a.readme.txt
├── dbNSFP4.3a_variant.chr1.gz
├── dbNSFP4.3a_variant.chr10.gz
├── dbNSFP4.3a_variant.chr11.gz
├── dbNSFP4.3a_variant.chr12.gz
├── dbNSFP4.3a_variant.chr13.gz
├── dbNSFP4.3a_variant.chr14.gz
├── dbNSFP4.3a_variant.chr15.gz
├── dbNSFP4.3a_variant.chr16.gz
├── dbNSFP4.3a_variant.chr17.gz
├── dbNSFP4.3a_variant.chr18.gz
├── dbNSFP4.3a_variant.chr19.gz
├── dbNSFP4.3a_variant.chr2.gz
├── dbNSFP4.3a_variant.chr20.gz
├── dbNSFP4.3a_variant.chr21.gz
├── dbNSFP4.3a_variant.chr22.gz
├── dbNSFP4.3a_variant.chr3.gz
├── dbNSFP4.3a_variant.chr4.gz
├── dbNSFP4.3a_variant.chr5.gz
├── dbNSFP4.3a_variant.chr6.gz
├── dbNSFP4.3a_variant.chr7.gz
├── dbNSFP4.3a_variant.chr8.gz
├── dbNSFP4.3a_variant.chr9.gz
├── dbNSFP4.3a_variant.chrM.gz
├── dbNSFP4.3a_variant.chrX.gz
├── dbNSFP4.3a_variant.chrY.gz
├── search_dbNSFP43a.class
├── search_dbNSFP43a.jar
├── search_dbNSFP43a.readme.pdf
├── try.vcf
├── tryhg18.in
├── tryhg19.in
└── tryhg38.in

1 directory, 36 files

Each of the dbNSFP4.3a_var.chr#.gz files is a tab-separated values file. I decompressed dbNSFP4.3a_variant.chrX.gz. I wanted to search through it for hg_19pos 152959399. Based on the screenshot @cgpreston provided in biothings/myvariant.info#179, hg_19pos 152959399 should have multiple REVEL scores associated with it:

scores

Here's what I found:

> rg --count "152959399" dbNSFP4.3a_variant.chrX
5

So there are five lines in the file that have "152959399" in them. I searched the output of the initial search for the REVEL scores @cgpreston shows in her screenshot:

> rg "152959399" dbNSFP4.3a_variant.chrX | rg --count "0\.173"
1
> rg "152959399" dbNSFP4.3a_variant.chrX | rg --count "0\.109"
> rg "152959399" dbNSFP4.3a_variant.chrX | rg --count "0\.653"
2

Some observations:

  • One of the REVEL scores (0.109) is missing.
  • The 0.653 REVEL score doesn't show up in myvariant.info, but it is in v4.3a of dbNSFP. I won't post the whole output here because it is very wide, but I will attach a screenshot below.

search

@liammulh
Copy link
Member

liammulh commented Dec 6, 2023

I wrote a script that prints columns instead of rows. Here are the REVEL_scores and REVEL_rankscores for hg_19pos 152959399:

('REVEL_score', '0.173', '0.653;0.653', '0.177', '0.653;0.653', '0.606;0.606')
('REVEL_rankscore', '0.43840', '0.86936', '0.44549', '0.86936', '0.84506')

@liammulh
Copy link
Member

liammulh commented Dec 6, 2023

I initially planned on using https://github.com/ClinGen/gci-vci-aws/issues/1378 to track the work for this issue, but I think it makes more sense to track the work here.

@liammulh
Copy link
Member

liammulh commented Dec 6, 2023

Okay, I've updated biothings/myvariant.info#179 (comment). Hopefully it is useful to them. I am going to move on to other tickets for now.

@cgpreston
Copy link
Author

cgpreston commented Dec 14, 2023

@liammulh : when you're back next week can we chat about a new approach to this project? Thanks

@wrightmw
Copy link
Member

wrightmw commented Apr 2, 2024

Now we're bringing in via LDH... switch this project to Bryan so he can look into these data in the LDH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants