-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best Practice: DefinedTerm from Ontologies #652
Comments
An activity at the BH_2023 was to analyse what |
Expanding on the format of
Things to note or possible issues:
Examples:
|
Hi @ivanmicetic , I am unsure about
While there is heavy confusion about The DefinedTerm documentation says "termCode: A code that identifies this DefinedTerm within a DefinedTermSet.", which is indeed a bit vague ... If we'd stick with "termCode should not have the prefix of the ontology", I'd love to have a pointer to a resource that recommends this. Maybe that could be the definition from identifiers.org ? Does that work for all ontologies we have in OBO and bioportal ? Yours, |
These things can never be made straight and we always have to live with them. In the KnetMiner project, we treat IDs like ECO_0005670 as accessions, usually attaching the source (GO, ECO, ENSEMBL, etc), and associating an item to the multiple accessions and accession variants it might have (ECO_XXX, ECO:XXX, etc).
We rarely need to extract the 'term code' in the sense of the numerical part. To me, it doesn't mean much, apart from rare and peculiar use cases. One case where we consider the composition is when we try to merge entities with the same or very similar accessions, eg, if one term has ECO_0005670 as accession and another ECO:0005670, then they're very likely the same, and this can be detected with a merge/normalisation tool, using a regex like Apart from that case, We never consider the numerical part alone and I've never felt the need to store it in cleaned/published data. There might be use cases where you actually want it, but adopting the idea that |
The only place where
Note that this applies to compact identifiers or sample URLs for identifiers.org identifiers and not elswhere since ECO itself uses both Here you can find a spreadsheet with the summary of proposed solutions/recommendations for DefinedTerm discussed in this issue. We could use it to see the most favored solution and to monitor the evolution/progress of this new profile (if you find it useful). Regards, |
Hi, my initial urge was to say "duh, the local identifiers without prefix are useless, since there is no context and I wouldn't know how to use 'em then", similar to @marco-brandizi comment above. Hence, I really had hoped we'd find a way to document this to be a https://en.wikipedia.org/wiki/CURIE. Yours, |
Hi again, as part of the discussion I hacked a jq script to reshape the response from the OLS based terminology service to return a
or the equivalent command line:
resulting in
And yes, this comment is also I know where to put these code snippets to find later :-) |
@sneumann, I agree that the term code without prefix is quite useless and would favour the use of CURIEs. I made a quick look at the curies package and I like how they solved the standardization of CURIEs in order to use multiple synonym prefixes as well as URI prefix synonyms: from curies import Converter, Record
converter = Converter([
Record(
prefix="GO",
prefix_synonyms=["gomf", "gocc", "gobp", "go", ...],
uri_prefix="http://purl.obolibrary.org/obo/GO_",
uri_prefix_synonyms=[
"http://amigo.geneontology.org/amigo/term/GO:",
"https://identifiers.org/GO:",
...
],
),
# And so on
...
])
>>> converter.standardize_prefix("gomf")
'GO'
>>> converter.standardize_curie('gomf:0032571')
'GO:0032571'
>>> converter.standardize_uri('http://amigo.geneontology.org/amigo/term/GO:0032571')
'http://purl.obolibrary.org/obo/GO_0032571' Maybe we could translate this concept to |
Hi,
several people are representing terms from ontologies via DefinedTerm (example ), but I guess there are different flavors out there how exactly to do that.
Hence I would like to call for 1) better documentation, e.g. on our
Getting Started
tab, and/or 2) even a profile for an ontology-backedDefinedTerm
. The main rationale is that I see validators and harvesters starting to connect to terminology services, so we should make it easy for them to recognise and follow ontology terms.So, starting towards better documentation, can we come up with examples and promises how to represent a
DefinedTerm
?I am most concerned about our recommendations for
@id
,identifier
,url
,termCode
, all of which somehow identify/lead to the ontology term.Similarly, we might want recommendations for the DefinedTermSet. Above we have:
Is that enough as minimum information ? Very often we have
@context
,@id
and for profilesdct:conformsTo
as marginality minimum.How do we tell validators that there is an external controlled vocabulary/ontology behind a term, and not just a flat list of hasDefinedTerm in the set ?
Do we specify the ontology lookup services as
url
?Anything else we'd need for
DefinedTermSet
?Yours,
Steffen
The text was updated successfully, but these errors were encountered: