Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI/CURIE disambiguation #77

Closed
Silvanoc opened this issue Sep 20, 2023 · 7 comments
Closed

URI/CURIE disambiguation #77

Silvanoc opened this issue Sep 20, 2023 · 7 comments

Comments

@Silvanoc
Copy link

Silvanoc commented Sep 20, 2023

This package provides tools to handle CURIEs. But if I'm right, it starts from the assumption that what it's being provided is a CURIE and nothing else.

In the cases where both URIs and CURIEs are accepted, some ambiguities might appear between URIs and CURIEs (sorry, the original description got this link wrong); unless so-called SafeCURIEs are used. Meaning that your package might get called to work on what was wrongly supposed to be a CURIE, but it's a URI.

Now the question is if you plan to provide some validation functionality that might "ring some bells" on the users, if what is being provided might be a URI wrongly supposed to be a CURIE.

See follow-up PRs in linkml-runtime:

@cthoyt
Copy link
Member

cthoyt commented Sep 21, 2023

I'm not really sure I understand what you are asking for - can you try rewriting your issue to have less visual clutter (e.g., mock the function in a python block)? I'm not familiar with the formalism you're using here to describe this function, but it's very confusing for me

Also it's not clear if this is unique functionality or this would build on top of existing functionality in this package.

@Silvanoc
Copy link
Author

Ups, I'm sorry. I took my time to structure it and thought it would be understandable. Obviously not 🙁

I'm adding first the reason for the question on the top of the issue description. If you agree on it, I can then take some time to better elaborate the idea. If you say straight away "out-of-scope", then I'm saving my time 🙂

@cthoyt
Copy link
Member

cthoyt commented Sep 21, 2023

FYI, I'm leaning towards out of scope.

But, I'm curious where you would want to use such functionality in practice. Do you have a dataset where this would be appropriate to apply to it?

@Silvanoc
Copy link
Author

I'm getting here from the LinkML project, which declares the type Uriorcurie.

There the function is_curie has been implemented to decide is an attribute of type Uriorcurie should be handled as a CURIE or as a URI. But it's been implemented in such a way that it's interpreting lot of valid URIs as CURIES (due to the URI/CURIE ambiguity mentioned in the issue description).

I've started discussing in LinkML how to tackle the issue. Although I only need the fix in LinkML, I've a strong OSS-mindset and "sharing is caring" attitude. Therefore I thought that resolving it in a low-level component like curies would be more benefitial for the community.

I'm not asking to implement it, don't get me wrong. I could try to prepare a PR for it, but I don't want to waste my time. That's why I'm trying first to find out, if you could be interested.

BTW, I've updated the top of the description to provide a better context description.

@Silvanoc
Copy link
Author

Silvanoc commented Sep 21, 2023

NOTE: Function interface draft originally in the description, moved to a separate comment so that description is on problem space and this comment is focused on solution space.

Something like a validation/sanitation functionality would be really useful. Let me illustrate it with a couple of examples (assuming a function called curie_validate(curie,prefixes): valid(boolean), issue_level(integer), hints(list of strings)):

  • curie_validate("[schema]:Thing", ["schema"]) could return True, 0, ["no URI ambiguity"]
  • curie_validate("@schema@:Thing", ["schema"]) could return True, 0, ["no URI ambiguity"]1
  • curie_validate("schema:Thing", ["schema"]) could return True, 1, ["potential URI ambiguity", "matching prefix"]
  • curie_validate("schema:Thing") could return True, 2, ["potential URI ambiguity", "no prefix matching check possible"]
  • curie_validate("mailto:[email protected]") could return True, 3, ["potential URI ambiguity", "no matching prefix", "unexpected format/characters"]2
  • curie_validate("my_prefix:Thing", ["schema"]) could return False, 0, ["unknown prefix"]
  • curie_validate("http://example.org") could return False, 1, ["invalid CURIE"]

Footnotes

  1. If I read the URI specification right, @ should be fine for URI/CURIE disambiguation.

  2. According the CURIE specification, "mailto:[email protected]" is a valid CURIE! This is the CURIE grammar: curie := [ [ prefix ] ':' ] irelative-ref, being irelative-ref defined in the IRI specification. Look here for some examples of valid irelative-refs.

@cthoyt
Copy link
Member

cthoyt commented Sep 21, 2023

Okay, I have a few opinions:

  1. We shouldn't need something like this, since we should either have fields that are only for URIs, CURIEs, or if they must co-exist, we can use the official solution which is Safe CURIEs
  2. We don't live in a world where people use official solutions, LinkML being one example
  3. There are reasonable concerns about the quality and correctness of code that's hacked into LinkML, so if you are looking for a home for generally reusable code that is easy to understand, high quality, well documented, and fully tested, I would be happy to accept a contribution into this package such that others can reuse it
  4. I don't typically implement solutions in my open source software to problems that I fundamentally disagree with, so I will probably not work on or support this myself (besides code review!)
  5. If you want to make a contribution, you should be somewhat available for maintenance and discussions regarding such code

@Silvanoc
Copy link
Author

Silvanoc commented Sep 21, 2023

TL;DR: Since I don't think I would have time to maintain the code and I don't see much interest from your side, I won't try to contribute a solution for a problem that you disagree with. Thanks anyway for the discussion.

We shouldn't need something like this, since we should either have fields that are only for URIs, CURIEs, or if they must co-exist, we can use the official solution which is Safe CURIEs

I agree, but migrating from non-SafeCURIEs to SafeCURIEs is not so easy. So I don't count on convincing them to do so. In any case a migration strategie would be needed not to force all existing LinkML schemas to be migrated.

We don't live in a world where people use official solutions, LinkML being one example

I don't know what you consider an official solution. Sticking to the specs? Using well established components?

There are reasonable concerns about the quality and correctness of code that's hacked into LinkML, so if you are looking for a home for generally reusable code that is easy to understand, high quality, well documented, and fully tested, I would be happy to accept a contribution into this package such that others can reuse it

As said before, that's exactly the intention contacting you. Finding out if getting here a central solution that can be reused is an option for you. Got the message, you're open for PRs, but they must provide "reusable code that is easy to understand, high quality, well documented, and fully tested" 🙂

I don't typically implement solutions in my open source software to problems that I fundamentally disagree with, so I will probably not work on or support this myself (besides code review!)
If you want to make a contribution, you should be somewhat available for maintenance and discussions regarding such code

Your fundamental disagreement with the problem is a major drawback for me. IMO this issue won't only exist in LinkML, otherwise the CURIE spec wouldn't mention it. Since I don't think I would have time to maintain the code and I don't see much interest from your side, I won't even try and therefore close this issue. Thanks anyway for the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants