Supporting population allele frequencies #29

likhitha-surapaneni · 2024-01-17T13:52:25Z

Issue: #19
Ticket: https://www.ebi.ac.uk/panda/jira/browse/ENSVAR-6165
Also includes bug fix for most_severe_consequence

New endpoint for Population
Testing variants (could not find a substitution example with non-null allele frequencies)
- 1:230710048:rs699 (bi-allelic snp)
- 13:32379902:rs202155613 (multi-allelic snp)
- 13:57932480:rs11276267 (insertion)
- 1:10123:rs1639546401 (deletion)
- 1:10153:rs1639547929 (indel)
Update information in population_metadata.json (add gnomADg, gnomADe)

jamie-m-a

Thanks for adding in gnomAD genomes, but I think we'll need to add version to it (perhaps in description of ALL) as I suspect this is 3.1.2 and soon we will update to 4.0 / 4.1

common/schemas/query.graphql

azangru · 2024-01-25T14:30:21Z

common/schemas/population.graphql

+"""
+  Population
+"""
+    name: String ## Requires to be nullable as super-population can be a null field


Does the comment refer to the correct field? It correctly points out that the super_population field is nullable; but this is the name field, which I would have thought can never be null.

Due to the way GraphQL handles nullability, a nullable Type (Type population in case of super_population) having a non-null field name results in an error as super_population can be null but super_population->name cannot be null
See article.
The following is the error if we make name non-nullable. This needs to be handled better.

populations(genome_id: "a7335667-93e7-11ec-a39d-005056b38ce3") { name description is_global super_population { name # line causing error } sub_populations { name } } }

Error: "message": "Cannot return null for non-nullable field Population.name.",

Type population in case of super_population) having a non-null field name results in an error as super_population can be null but super_population->name cannot be null
...
Error: "message": "Cannot return null for non-nullable field Population.name.",

The error looks to me like the resolver is trying to return some empty super-population object rather than a true null. GraphQL type validator seems to be saying here that it received an object, rather than a null; and when it looked into its name field, there was nothing there. Is there any way you could inspect what it is you are actually returning from the super_population resolver when there is no super-population?

Consider this api: https://swapi-graphql.eskerda.vercel.app/

Person is a nullable field. But it has a non-nullable field that is ID. If you request a person with an existing id, you get the data

if you request a person with a non-existing id, there is an error about the missing person; but the person in the data is just set to null, and there are no errors about the non-nullable field Person.id

Thanks @azangru , you are right. This seems be to an issue with the metadata file which should be fixed in the latest commits

examples/population/response.json

…aneni/ensembl-hypsipyle into population_scores

azangru · 2024-01-29T12:04:00Z

common/schemas/population.graphql

+    is_from_genotypes: Boolean
+    display_group_name: String
+    super_population: Population
+    sub_populations: [Population]


Looks like all fields in the schema should be non-nullable (unless display_group_name is allowed to be null? I do not know what this is).

Updated the file according to VDM, all the fields except super_population are non-nullable. display_group_name seems to be a string for displaying the population group on the website. Currently it seems to be redundant to name. I can check with the team once

common/file_model/population_metadata.json

nakib103 · 2024-01-29T19:01:04Z

graphql_service/resolver/variant_model.py

+@POPULATION_TYPE.field("super_population")
+## This may still break when queried for other fields in super_population
+def resolve_super_population(population: Dict, info: GraphQLResolveInfo):
+    return population["super_population"] if population["super_population"] and population["super_population"]["name"] else None
+


The error was coming from population_metadata.json. As long as sub_population is set to null in JSON it would be fine return the loaded JSON as is.

So this is redundant. But still be kept to avoid any typo/mishap in meta data file.

Good spot @nakib103 , thank you. This should solve the issue.

nakib103 · 2024-01-29T19:07:59Z

common/file_model/variant.py

+                maf_frequency, maf_allele, maf_population = by_population_sorted[-2]
+                pop_frequency_map[maf_allele][maf_population]["is_minor_allele"] = True
+                hpmaf.append([maf_frequency,maf_allele,maf_population])
+        if len(hpmaf) > 0:


There can be multiple hpmaf allele. MAF with same highest frequency level in separate populations.

Not much effect in EV currently as not part of the view.

Thanks @nakib103 , this seems valid. Similarly, there can be multiple alleles having MAF, we need to mark them. Example to test multiple alleles having same MAF: 13:57932480:rs11276267

Thanks Likhitha, also tested for 17:63992940:rs1183731126 hpmaf.

nakib103 · 2024-01-29T19:16:49Z

common/file_model/variant.py

+            by_population_sorted = sorted(by_population, key=lambda item: item[0])
+            if len(by_population_sorted) >= 2:
+                maf_frequency, maf_allele, maf_population = by_population_sorted[-2]
+                pop_frequency_map[maf_allele][maf_population]["is_minor_allele"] = True


We now have multiple population and corresponding MAF. We should somehow have a way to tell which MAF we want to represent in the GB drawer. Before, we only have one population and it was not a issue.

It does not effect the current EV design.

This may need further discussion for how we would like the API to communicate this, for example, through a field in variantor through the sorting order in population_allele_frequencies

…around

azangru · 2024-01-30T12:54:01Z

Thanks for adding in gnomAD genomes

@jamie-m-a @likhitha-surapaneni is there a mechanism in the population frequencies data that will allow the client to separate it by studies — e.g. 1000 genomes vs gnomAD version x, etc.?

likhitha-surapaneni · 2024-01-30T13:15:58Z

Thanks for adding in gnomAD genomes

@jamie-m-a @likhitha-surapaneni is there a mechanism in the population frequencies data that will allow the client to separate it by studies — e.g. 1000 genomes vs gnomAD version x, etc.?

Hi @azangru, population_name has a prefix of the population study in the current data. Eg: gnomADg:afr belongs to gnomADg but it might not be descriptive: gnomAD genomes v3.1.2.

azangru · 2024-01-30T13:24:14Z

population_name has a prefix of the population study in the current data. Eg: gnomADg:afr belongs to gnomADg but it might not be descriptive: gnomAD genomes v3.1.2.

I should have mentioned that we would like something better than parsing the population name string :-) Partly because this is generally not a good idea, and partly because the design suggests that there will be some beautiful labels:

nakib103

LGTM! tested and functional, other improvements that I mentioned can be done later on.
thanks Likhitha!

nakib103 · 2024-01-30T13:35:59Z

common/file_model/variant.py

+                maf_frequency, maf_allele, maf_population = by_population_sorted[-2]
+                pop_frequency_map[maf_allele][maf_population]["is_minor_allele"] = True
+                hpmaf.append([maf_frequency,maf_allele,maf_population])
+        if len(hpmaf) > 0:


Thanks Likhitha, also tested for 17:63992940:rs1183731126 hpmaf.

likhitha-surapaneni · 2024-01-30T13:39:17Z

population_name has a prefix of the population study in the current data. Eg: gnomADg:afr belongs to gnomADg but it might not be descriptive: gnomAD genomes v3.1.2.

I should have mentioned that we would like something better than parsing the population name string :-) Partly because this is generally not a good idea, and partly because the design suggests that there will be some beautiful labels:

This may require querying the population object which contains the field display_group_name. This field contains information about the population group required by the website (information in the labels). population_allele_frequency->population_name can be matched with population->name to fetch population->display_group_name if that is feasible. population_metadata.json should be updated accordingly from the API end. Does that work for the client @azangru ?

azangru · 2024-01-30T14:10:46Z

This may require querying the population object which contains the field display_group_name. This field contains information about the population group required by the website (information in the labels). population_allele_frequency->population_name can be matched with population->name to fetch population->display_group_name if that is feasible. population_metadata.json should be updated accordingly from the API end. Does that work for the client @azangru ?

population->display_group_name sounds good, if there is no more information you would like to associate with 1000 genomes, gnomAD, etc. (i.e. you do not expect the client to ever need to provide a link to them). If there is such extra information, then something ExternalDB-shaped might be more appropriate.

likhitha-surapaneni · 2024-01-30T16:04:30Z

This may require querying the population object which contains the field display_group_name. This field contains information about the population group required by the website (information in the labels). population_allele_frequency->population_name can be matched with population->name to fetch population->display_group_name if that is feasible. population_metadata.json should be updated accordingly from the API end. Does that work for the client @azangru ?

population->display_group_name sounds good, if there is no more information you would like to associate with 1000 genomes, gnomAD, etc. (i.e. you do not expect the client to ever need to provide a link to them). If there is such extra information, then something ExternalDB-shaped might be more appropriate.

Thank you @azangru , I discussed with @jamie-m-a and population->display_group_name would be string for now. If the design changes in the future, we will make the above suggested changes to VDM.

nakib · 2024-01-30T16:06:07Z

Hello all, why am I getting these emails?

…

On Tue, Jan 30, 2024, 17:04 Likhitha Surapaneni ***@***.***> wrote: This may require querying the population object which contains the field display_group_name. This field contains information about the population group required by the website (information in the labels). population_allele_frequency->population_name can be matched with population->name to fetch population->display_group_name if that is feasible. population_metadata.json should be updated accordingly from the API end. Does that work for the client @azangru <https://github.com/azangru> ? population->display_group_name sounds good, if there is no more information you would like to associate with 1000 genomes, gnomAD, etc. (i.e. you do not expect the client to ever need to provide a link to them). If there is such extra information, then something ExternalDB-shaped <https://github.com/Ensembl/ensembl-cdm-docs/blob/main/src/docs/external_db.md> might be more appropriate. Thank you @azangru <https://github.com/azangru> , I discussed with @jamie-m-a <https://github.com/jamie-m-a> and population->display_group_name would be string for now. If the design changes in the future, we will make the above suggested changes to VDM. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAN7Y2LRVGDYVQZSE66IF2DYREKZTAVCNFSM6AAAAABB6SRX2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJXGM3DCNBRHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

likhitha-surapaneni · 2024-01-30T16:10:31Z

Hello all, why am I getting these emails?

Sincere apologies for the notification, it was due to mistyped username tagging.

jamie-m-a

Thanks for all the work on this. I think we're ok to merge now, any subsequent issues that arise can be addressed by new PRs.

likhitha-surapaneni added 2 commits December 15, 2023 14:08

Parsing populations from VCF

fd86f09

Logic to compute ref allele frequency, maf, hpmaf

ae26a08

likhitha-surapaneni marked this pull request as draft January 17, 2024 13:55

likhitha-surapaneni marked this pull request as ready for review January 18, 2024 11:07

Added endpoint to return population object

6a384ac

likhitha-surapaneni marked this pull request as draft January 23, 2024 11:08

likhitha-surapaneni marked this pull request as ready for review January 23, 2024 14:57

likhitha-surapaneni added 4 commits January 23, 2024 14:58

Adding Population schema

eae1b9a

Fixed a bug in population frequencies

0247d13

Making allele_frequency nullable

6865f77

Reverting nullable allele_frequency for now

22308a1

jamie-m-a approved these changes Jan 24, 2024

View reviewed changes

likhitha-surapaneni added 2 commits January 24, 2024 14:17

Reverting nullable allele_frequency for now

91699ce

Added population metadata for gnomadg and gnomade

36e7723

jamie-m-a requested changes Jan 24, 2024

View reviewed changes

likhitha-surapaneni added 11 commits January 24, 2024 15:30

Added population metadata for gnomadg and gnomade

940ccf8

Added examples for population and population_allele_frequencies

f6a2d27

Renamed file

31ca0f1

Minor typo

4eb9e14

Added examples for gnomADg, gnomADe

e6a5f3e

Cleanup of files

bc8ba60

Added version metadata for gnomADg and gnomADe

1870e41

Allele_frequency as nullable

3a78f32

Not returning population_allele_frequencies with allele_frequency null

f661d72

Removed updates to example payload

5575ce0

Removed updates to example payload

4dfa7d3

likhitha-surapaneni requested a review from nakib103 January 25, 2024 12:03

likhitha-surapaneni assigned nakib103 Jan 25, 2024

azangru reviewed Jan 25, 2024

View reviewed changes

common/schemas/query.graphql Outdated Show resolved Hide resolved

azangru reviewed Jan 25, 2024

View reviewed changes

likhitha-surapaneni added 2 commits January 25, 2024 14:40

Modified endpoint from population to populations; Updated examples

bb6ae82

Merge branch 'main' into population_scores

d5a100c

azangru reviewed Jan 28, 2024

View reviewed changes

examples/population/response.json Show resolved Hide resolved

likhitha-surapaneni added 4 commits January 29, 2024 10:35

Fixing is_global values

8a7557d

Merge branch 'population_scores' of https://github.com/likhitha-surap…

42b7667

…aneni/ensembl-hypsipyle into population_scores

Fixing the nullability logic in super_population

193ffe1

Fixing the nullability logic in super_population

a032adf

azangru reviewed Jan 29, 2024

View reviewed changes

nakib103 requested changes Jan 29, 2024

View reviewed changes

likhitha-surapaneni added 4 commits January 29, 2024 20:34

Fixed typo in population metadata file and removed the redundant work…

518b5c8

…around

Fixed typo in population metadata file and removed the redundant work…

f38e487

…around

Updated population.graphql to conform with VDM

b74af67

Handling multiple maf and hpmaf alleles

f3bfa14

nakib103 approved these changes Jan 30, 2024

View reviewed changes

Added display_group_name

4052b2c

jamie-m-a self-requested a review January 31, 2024 10:19

jamie-m-a approved these changes Jan 31, 2024

View reviewed changes

nakib103 merged commit 517296f into Ensembl:main Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting population allele frequencies #29

Supporting population allele frequencies #29

likhitha-surapaneni commented Jan 17, 2024 •

edited

Loading

jamie-m-a left a comment

azangru Jan 25, 2024

likhitha-surapaneni Jan 25, 2024 •

edited

Loading

azangru Jan 25, 2024 •

edited

Loading

likhitha-surapaneni Jan 30, 2024

azangru Jan 29, 2024

likhitha-surapaneni Jan 30, 2024 •

edited

Loading

nakib103 Jan 29, 2024

likhitha-surapaneni Jan 29, 2024

nakib103 Jan 29, 2024

likhitha-surapaneni Jan 30, 2024 •

edited

Loading

nakib103 Jan 30, 2024

nakib103 Jan 29, 2024

likhitha-surapaneni Jan 30, 2024

azangru commented Jan 30, 2024

likhitha-surapaneni commented Jan 30, 2024 •

edited

Loading

azangru commented Jan 30, 2024

nakib103 left a comment •

edited

Loading

nakib103 Jan 30, 2024

likhitha-surapaneni commented Jan 30, 2024 •

edited

Loading

azangru commented Jan 30, 2024

likhitha-surapaneni commented Jan 30, 2024

nakib commented Jan 30, 2024 via email

likhitha-surapaneni commented Jan 30, 2024 •

edited

Loading

jamie-m-a left a comment

Supporting population allele frequencies #29

Supporting population allele frequencies #29

Conversation

likhitha-surapaneni commented Jan 17, 2024 • edited Loading

jamie-m-a left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

likhitha-surapaneni Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

azangru Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

likhitha-surapaneni Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

likhitha-surapaneni Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azangru commented Jan 30, 2024

likhitha-surapaneni commented Jan 30, 2024 • edited Loading

azangru commented Jan 30, 2024

nakib103 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

likhitha-surapaneni commented Jan 30, 2024 • edited Loading

azangru commented Jan 30, 2024

likhitha-surapaneni commented Jan 30, 2024

nakib commented Jan 30, 2024 via email

likhitha-surapaneni commented Jan 30, 2024 • edited Loading

jamie-m-a left a comment

Choose a reason for hiding this comment

likhitha-surapaneni commented Jan 17, 2024 •

edited

Loading

likhitha-surapaneni Jan 25, 2024 •

edited

Loading

azangru Jan 25, 2024 •

edited

Loading

likhitha-surapaneni Jan 30, 2024 •

edited

Loading

likhitha-surapaneni Jan 30, 2024 •

edited

Loading

likhitha-surapaneni commented Jan 30, 2024 •

edited

Loading

nakib103 left a comment •

edited

Loading

likhitha-surapaneni commented Jan 30, 2024 •

edited

Loading

likhitha-surapaneni commented Jan 30, 2024 •

edited

Loading