Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting population allele frequencies #29

Merged
merged 31 commits into from
Jan 31, 2024

Conversation

likhitha-surapaneni
Copy link
Contributor

@likhitha-surapaneni likhitha-surapaneni commented Jan 17, 2024

Issue: #19
Ticket: https://www.ebi.ac.uk/panda/jira/browse/ENSVAR-6165
Also includes bug fix for most_severe_consequence

  • New endpoint for Population
  • Testing variants (could not find a substitution example with non-null allele frequencies)
    • 1:230710048:rs699 (bi-allelic snp)
    • 13:32379902:rs202155613 (multi-allelic snp)
    • 13:57932480:rs11276267 (insertion)
    • 1:10123:rs1639546401 (deletion)
    • 1:10153:rs1639547929 (indel)
  • Update information in population_metadata.json (add gnomADg, gnomADe)

@likhitha-surapaneni likhitha-surapaneni marked this pull request as draft January 17, 2024 13:55
@likhitha-surapaneni likhitha-surapaneni marked this pull request as ready for review January 18, 2024 11:07
@likhitha-surapaneni likhitha-surapaneni marked this pull request as draft January 23, 2024 11:08
@likhitha-surapaneni likhitha-surapaneni marked this pull request as ready for review January 23, 2024 14:57
Copy link
Contributor

@jamie-m-a jamie-m-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding in gnomAD genomes, but I think we'll need to add version to it (perhaps in description of ALL) as I suspect this is 3.1.2 and soon we will update to 4.0 / 4.1

"""
Population
"""
name: String ## Requires to be nullable as super-population can be a null field
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the comment refer to the correct field? It correctly points out that the super_population field is nullable; but this is the name field, which I would have thought can never be null.

Copy link
Contributor Author

@likhitha-surapaneni likhitha-surapaneni Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the way GraphQL handles nullability, a nullable Type (Type population in case of super_population) having a non-null field name results in an error as super_population can be null but super_population->name cannot be null
See article.
The following is the error if we make name non-nullable. This needs to be handled better.

populations(genome_id: "a7335667-93e7-11ec-a39d-005056b38ce3")
  {
    
    name
    description
    is_global
    super_population {
      name # line causing error
    }
    sub_populations
    {
      name
    }
    
  }
}

Error: "message": "Cannot return null for non-nullable field Population.name.",

Copy link
Collaborator

@azangru azangru Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type population in case of super_population) having a non-null field name results in an error as super_population can be null but super_population->name cannot be null
...
Error: "message": "Cannot return null for non-nullable field Population.name.",

The error looks to me like the resolver is trying to return some empty super-population object rather than a true null. GraphQL type validator seems to be saying here that it received an object, rather than a null; and when it looked into its name field, there was nothing there. Is there any way you could inspect what it is you are actually returning from the super_population resolver when there is no super-population?

Consider this api: https://swapi-graphql.eskerda.vercel.app/

Person is a nullable field. But it has a non-nullable field that is ID. If you request a person with an existing id, you get the data

image

if you request a person with a non-existing id, there is an error about the missing person; but the person in the data is just set to null, and there are no errors about the non-nullable field Person.id

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @azangru , you are right. This seems be to an issue with the metadata file which should be fixed in the latest commits

is_from_genotypes: Boolean
display_group_name: String
super_population: Population
sub_populations: [Population]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all fields in the schema should be non-nullable (unless display_group_name is allowed to be null? I do not know what this is).

Copy link
Contributor Author

@likhitha-surapaneni likhitha-surapaneni Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the file according to VDM, all the fields except super_population are non-nullable. display_group_name seems to be a string for displaying the population group on the website. Currently it seems to be redundant to name. I can check with the team once

common/file_model/population_metadata.json Outdated Show resolved Hide resolved
common/file_model/population_metadata.json Outdated Show resolved Hide resolved
Comment on lines 174 to 178
@POPULATION_TYPE.field("super_population")
## This may still break when queried for other fields in super_population
def resolve_super_population(population: Dict, info: GraphQLResolveInfo):
return population["super_population"] if population["super_population"] and population["super_population"]["name"] else None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error was coming from population_metadata.json. As long as sub_population is set to null in JSON it would be fine return the loaded JSON as is.

So this is redundant. But still be kept to avoid any typo/mishap in meta data file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot @nakib103 , thank you. This should solve the issue.

maf_frequency, maf_allele, maf_population = by_population_sorted[-2]
pop_frequency_map[maf_allele][maf_population]["is_minor_allele"] = True
hpmaf.append([maf_frequency,maf_allele,maf_population])
if len(hpmaf) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple hpmaf allele. MAF with same highest frequency level in separate populations.

Not much effect in EV currently as not part of the view.

Copy link
Contributor Author

@likhitha-surapaneni likhitha-surapaneni Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nakib103 , this seems valid. Similarly, there can be multiple alleles having MAF, we need to mark them. Example to test multiple alleles having same MAF: 13:57932480:rs11276267

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Likhitha, also tested for 17:63992940:rs1183731126 hpmaf.

by_population_sorted = sorted(by_population, key=lambda item: item[0])
if len(by_population_sorted) >= 2:
maf_frequency, maf_allele, maf_population = by_population_sorted[-2]
pop_frequency_map[maf_allele][maf_population]["is_minor_allele"] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have multiple population and corresponding MAF. We should somehow have a way to tell which MAF we want to represent in the GB drawer. Before, we only have one population and it was not a issue.

It does not effect the current EV design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need further discussion for how we would like the API to communicate this, for example, through a field in variantor through the sorting order in population_allele_frequencies

@azangru
Copy link
Collaborator

azangru commented Jan 30, 2024

Thanks for adding in gnomAD genomes

@jamie-m-a @likhitha-surapaneni is there a mechanism in the population frequencies data that will allow the client to separate it by studies — e.g. 1000 genomes vs gnomAD version x, etc.?

@likhitha-surapaneni
Copy link
Contributor Author

likhitha-surapaneni commented Jan 30, 2024

Thanks for adding in gnomAD genomes

@jamie-m-a @likhitha-surapaneni is there a mechanism in the population frequencies data that will allow the client to separate it by studies — e.g. 1000 genomes vs gnomAD version x, etc.?

Hi @azangru, population_name has a prefix of the population study in the current data. Eg: gnomADg:afr belongs to gnomADg but it might not be descriptive: gnomAD genomes v3.1.2.

@azangru
Copy link
Collaborator

azangru commented Jan 30, 2024

population_name has a prefix of the population study in the current data. Eg: gnomADg:afr belongs to gnomADg but it might not be descriptive: gnomAD genomes v3.1.2.

I should have mentioned that we would like something better than parsing the population name string :-) Partly because this is generally not a good idea, and partly because the design suggests that there will be some beautiful labels:

image

Copy link
Contributor

@nakib103 nakib103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! tested and functional, other improvements that I mentioned can be done later on.
thanks Likhitha!

maf_frequency, maf_allele, maf_population = by_population_sorted[-2]
pop_frequency_map[maf_allele][maf_population]["is_minor_allele"] = True
hpmaf.append([maf_frequency,maf_allele,maf_population])
if len(hpmaf) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Likhitha, also tested for 17:63992940:rs1183731126 hpmaf.

@likhitha-surapaneni
Copy link
Contributor Author

likhitha-surapaneni commented Jan 30, 2024

population_name has a prefix of the population study in the current data. Eg: gnomADg:afr belongs to gnomADg but it might not be descriptive: gnomAD genomes v3.1.2.

I should have mentioned that we would like something better than parsing the population name string :-) Partly because this is generally not a good idea, and partly because the design suggests that there will be some beautiful labels:

image

This may require querying the population object which contains the field display_group_name. This field contains information about the population group required by the website (information in the labels). population_allele_frequency->population_name can be matched with population->name to fetch population->display_group_name if that is feasible. population_metadata.json should be updated accordingly from the API end. Does that work for the client @azangru ?

@azangru
Copy link
Collaborator

azangru commented Jan 30, 2024

This may require querying the population object which contains the field display_group_name. This field contains information about the population group required by the website (information in the labels). population_allele_frequency->population_name can be matched with population->name to fetch population->display_group_name if that is feasible. population_metadata.json should be updated accordingly from the API end. Does that work for the client @azangru ?

population->display_group_name sounds good, if there is no more information you would like to associate with 1000 genomes, gnomAD, etc. (i.e. you do not expect the client to ever need to provide a link to them). If there is such extra information, then something ExternalDB-shaped might be more appropriate.

@likhitha-surapaneni
Copy link
Contributor Author

This may require querying the population object which contains the field display_group_name. This field contains information about the population group required by the website (information in the labels). population_allele_frequency->population_name can be matched with population->name to fetch population->display_group_name if that is feasible. population_metadata.json should be updated accordingly from the API end. Does that work for the client @azangru ?

population->display_group_name sounds good, if there is no more information you would like to associate with 1000 genomes, gnomAD, etc. (i.e. you do not expect the client to ever need to provide a link to them). If there is such extra information, then something ExternalDB-shaped might be more appropriate.

Thank you @azangru , I discussed with @jamie-m-a and population->display_group_name would be string for now. If the design changes in the future, we will make the above suggested changes to VDM.

@nakib
Copy link

nakib commented Jan 30, 2024 via email

@likhitha-surapaneni
Copy link
Contributor Author

likhitha-surapaneni commented Jan 30, 2024

Hello all, why am I getting these emails?

Sincere apologies for the notification, it was due to mistyped username tagging.

@jamie-m-a jamie-m-a self-requested a review January 31, 2024 10:19
Copy link
Contributor

@jamie-m-a jamie-m-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this. I think we're ok to merge now, any subsequent issues that arise can be addressed by new PRs.

@nakib103 nakib103 merged commit 517296f into Ensembl:main Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants