-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting population allele frequencies #29
Supporting population allele frequencies #29
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding in gnomAD genomes, but I think we'll need to add version to it (perhaps in description of ALL) as I suspect this is 3.1.2 and soon we will update to 4.0 / 4.1
common/schemas/population.graphql
Outdated
""" | ||
Population | ||
""" | ||
name: String ## Requires to be nullable as super-population can be a null field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the comment refer to the correct field? It correctly points out that the super_population field is nullable; but this is the name
field, which I would have thought can never be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the way GraphQL handles nullability, a nullable Type (Type population in case of super_population
) having a non-null field name
results in an error as super_population can be null but super_population->name
cannot be null
See article.
The following is the error if we make name
non-nullable. This needs to be handled better.
populations(genome_id: "a7335667-93e7-11ec-a39d-005056b38ce3")
{
name
description
is_global
super_population {
name # line causing error
}
sub_populations
{
name
}
}
}
Error: "message": "Cannot return null for non-nullable field Population.name.",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type population in case of super_population) having a non-null field name results in an error as super_population can be null but super_population->name cannot be null
...
Error: "message": "Cannot return null for non-nullable field Population.name.",
The error looks to me like the resolver is trying to return some empty super-population object rather than a true null. GraphQL type validator seems to be saying here that it received an object, rather than a null; and when it looked into its name
field, there was nothing there. Is there any way you could inspect what it is you are actually returning from the super_population
resolver when there is no super-population?
Consider this api: https://swapi-graphql.eskerda.vercel.app/
Person
is a nullable field. But it has a non-nullable field that is ID. If you request a person with an existing id, you get the data
if you request a person with a non-existing id, there is an error about the missing person; but the person in the data is just set to null, and there are no errors about the non-nullable field Person.id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @azangru , you are right. This seems be to an issue with the metadata file which should be fixed in the latest commits
common/schemas/population.graphql
Outdated
is_from_genotypes: Boolean | ||
display_group_name: String | ||
super_population: Population | ||
sub_populations: [Population] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all fields in the schema should be non-nullable (unless display_group_name
is allowed to be null? I do not know what this is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the file according to VDM, all the fields except super_population
are non-nullable. display_group_name
seems to be a string for displaying the population group on the website. Currently it seems to be redundant to name
. I can check with the team once
@POPULATION_TYPE.field("super_population") | ||
## This may still break when queried for other fields in super_population | ||
def resolve_super_population(population: Dict, info: GraphQLResolveInfo): | ||
return population["super_population"] if population["super_population"] and population["super_population"]["name"] else None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error was coming from population_metadata.json. As long as sub_population
is set to null
in JSON it would be fine return the loaded JSON as is.
So this is redundant. But still be kept to avoid any typo/mishap in meta data file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot @nakib103 , thank you. This should solve the issue.
maf_frequency, maf_allele, maf_population = by_population_sorted[-2] | ||
pop_frequency_map[maf_allele][maf_population]["is_minor_allele"] = True | ||
hpmaf.append([maf_frequency,maf_allele,maf_population]) | ||
if len(hpmaf) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be multiple hpmaf
allele. MAF with same highest frequency level in separate populations.
Not much effect in EV currently as not part of the view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nakib103 , this seems valid. Similarly, there can be multiple alleles having MAF, we need to mark them. Example to test multiple alleles having same MAF: 13:57932480:rs11276267
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Likhitha, also tested for 17:63992940:rs1183731126
hpmaf.
common/file_model/variant.py
Outdated
by_population_sorted = sorted(by_population, key=lambda item: item[0]) | ||
if len(by_population_sorted) >= 2: | ||
maf_frequency, maf_allele, maf_population = by_population_sorted[-2] | ||
pop_frequency_map[maf_allele][maf_population]["is_minor_allele"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have multiple population and corresponding MAF. We should somehow have a way to tell which MAF we want to represent in the GB drawer. Before, we only have one population and it was not a issue.
It does not effect the current EV design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need further discussion for how we would like the API to communicate this, for example, through a field in variant
or through the sorting order in population_allele_frequencies
@jamie-m-a @likhitha-surapaneni is there a mechanism in the population frequencies data that will allow the client to separate it by studies — e.g. 1000 genomes vs gnomAD version x, etc.? |
Hi @azangru, |
I should have mentioned that we would like something better than parsing the population name string :-) Partly because this is generally not a good idea, and partly because the design suggests that there will be some beautiful labels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! tested and functional, other improvements that I mentioned can be done later on.
thanks Likhitha!
maf_frequency, maf_allele, maf_population = by_population_sorted[-2] | ||
pop_frequency_map[maf_allele][maf_population]["is_minor_allele"] = True | ||
hpmaf.append([maf_frequency,maf_allele,maf_population]) | ||
if len(hpmaf) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Likhitha, also tested for 17:63992940:rs1183731126
hpmaf.
This may require querying the population object which contains the field |
|
Thank you @azangru , I discussed with @jamie-m-a and |
Hello all, why am I getting these emails?
…On Tue, Jan 30, 2024, 17:04 Likhitha Surapaneni ***@***.***> wrote:
This may require querying the population object which contains the field
display_group_name. This field contains information about the population
group required by the website (information in the labels).
population_allele_frequency->population_name can be matched with
population->name to fetch population->display_group_name if that is
feasible. population_metadata.json should be updated accordingly from the
API end. Does that work for the client @azangru
<https://github.com/azangru> ?
population->display_group_name sounds good, if there is no more
information you would like to associate with 1000 genomes, gnomAD, etc.
(i.e. you do not expect the client to ever need to provide a link to them).
If there is such extra information, then something ExternalDB-shaped
<https://github.com/Ensembl/ensembl-cdm-docs/blob/main/src/docs/external_db.md>
might be more appropriate.
Thank you @azangru <https://github.com/azangru> , I discussed with
@jamie-m-a <https://github.com/jamie-m-a> and
population->display_group_name would be string for now. If the design
changes in the future, we will make the above suggested changes to VDM.
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN7Y2LRVGDYVQZSE66IF2DYREKZTAVCNFSM6AAAAABB6SRX2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJXGM3DCNBRHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sincere apologies for the notification, it was due to mistyped username tagging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this. I think we're ok to merge now, any subsequent issues that arise can be addressed by new PRs.
Issue: #19
Ticket: https://www.ebi.ac.uk/panda/jira/browse/ENSVAR-6165
Also includes bug fix for most_severe_consequence