Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid uri shadowing #280

Merged
merged 8 commits into from
Oct 16, 2023
Merged

fix: avoid uri shadowing #280

merged 8 commits into from
Oct 16, 2023

Conversation

Silvanoc
Copy link
Contributor

@Silvanoc Silvanoc commented Oct 6, 2023

URIorCURIE validation function is wrongly identifying as CURIEs certain URIs. This patch first tries to validate if a URIorCURIE is a valid URI, otherwise it tries to validate if it's a valid CURIE.

Fixes linkml/linkml#1662

@Silvanoc
Copy link
Contributor Author

Following invalid URIs or CURIEs which weren't catched by the previous validation are not allowed anymore:

Following invalid URIs which weren't catched by the previous validation are not allowed anymore:

  • Those starting with ":", though they remain being valid CURIEs.

Furthermore following valid URIs that were being rejected are allowed:

  • Empty ones, like "''". It's a so-called same-document reference.
  • Those that look like a CURIE, like "rdf:type". Though they remain being interpreted as CURIES within the scope of a Uriorcurie range.
  • URNs are now also allowed. Like above, within the scope of a Uriorcurie range they remain being interpreted as CURIEs.

Footnotes

  1. Support for Safe-CURIEs, which might have '[' and ']' only surrounding the whole CURIE will be added on a later PR

@Silvanoc
Copy link
Contributor Author

Assuming the support for empty URIs (same-document reference) makes any difficulties, I don't think that deviating from the standard would harm much there.

@Silvanoc
Copy link
Contributor Author

Silvanoc commented Oct 11, 2023

@cthoyt the module providing validation functions here goes in the direction of what I was wishing to see in the curies package. You might be interested on it or even review it.

It doesn't resolve the URI - CURIE ambiguity where both are accepted yet though.

@Silvanoc
Copy link
Contributor Author

Silvanoc commented Oct 13, 2023

Small hint for easy manual testing:

poetry run python -c 'from linkml_runtime.utils.uri_validator import validate_uri ; \
    print(validate_uri("urn:namespace:resource"))'

will return a regex match object, since it succeeds.

Improve URI and CURIE validation using regular expressions based on the
corresponding official specifications.

URI and CURIE validation mechanisms where reporting valid URIs (like a
URN 'urn:abc:123') as invalid. This patch fixes this issue.

Signed-off-by: Silvano Cirujano Cuesta <[email protected]>
'meaning' is declared to be Uriorcurie according [1], but at least one
example being used for testing wasn't compliant (neither '[' nor ']' are
allowed in the reference part of a curie) and therefore the introduced
stricter URI and CURIE validation was failing.
This patch fixes the issue.

[1]: https://linkml.io/linkml-model/latest/docs/meaning/

Signed-off-by: Silvano Cirujano Cuesta <[email protected]>
'source_nodes' is declared to be Uriorcurie according [1], but at least one
example being used for testing wasn't compliant (' ! unit' is not
allowed in the reference part of a curie) and therefore the introduced
stricter URI and CURIE validation was failing.
This patch fixes the issue.

[1]: https://linkml.io/linkml-model/latest/docs/source_nodes/

Signed-off-by: Silvano Cirujano Cuesta <[email protected]>
Empty URIs are so-called same-document references and are legal URIs,
therefore fixing the wrong text expectation (derived from the previous
non-standard conform implementation).

Signed-off-by: Silvano Cirujano Cuesta <[email protected]>
Rename test for URIorCURIE types and also focus only on testing that
class, moving tests applying to the URI class to the corresponding test.

Also add/modify following tests:
* 'abc:[123]' is an invalid CURIE.
* ':123' is an invalid URI (but a valid CURIE)
* '' is a valid URI (a same-document reference)
* 'rdf:type' is a valid URI (also a valid CURIE)
* 'urn:abc:123' is a valid URI (since it's a valid URN)

Signed-off-by: Silvano Cirujano Cuesta <[email protected]>
Adding all URIs shown in the URI Wikipedia page [1] to the tests.

[1]: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Example_URIs

Signed-off-by: Silvano Cirujano Cuesta <[email protected]>
@Silvanoc
Copy link
Contributor Author

I've added named groups for better unterstanding of how the validator is parsing the passed strings.

poetry run python -c 'from linkml_runtime.utils.uri_validator import validate_uri ; \
    from pprint import pprint ; \
    pprint(validate_uri("ldap://[2001:db8::7]/c=GB?objectClass?one").groupdict())'

returns

{'authority': '[2001:db8::7]',
 'fragment': None,
 'hier_part': '//[2001:db8::7]/c=GB',
 'host': '[2001:db8::7]',
 'port': None,
 'query': 'objectClass?one',
 'scheme': 'ldap',
 'uri': 'ldap://[2001:db8::7]/c=GB?objectClass?one',
 'userinfo': None}

@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (f25abe8) 61.81% compared to head (b86c1fa) 62.11%.
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #280      +/-   ##
==========================================
+ Coverage   61.81%   62.11%   +0.29%     
==========================================
  Files          62       63       +1     
  Lines        8393     8459      +66     
  Branches     2164     2169       +5     
==========================================
+ Hits         5188     5254      +66     
  Misses       2599     2599              
  Partials      606      606              
Files Coverage Δ
linkml_runtime/utils/metamodelcore.py 78.43% <100.00%> (+1.06%) ⬆️
linkml_runtime/utils/uri_validator.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Silvano Cirujano Cuesta <[email protected]>
Add new tests to improve test coverage.

Remove also unused (and therefore not tested) functionality.

Signed-off-by: Silvano Cirujano Cuesta <[email protected]>
@Silvanoc
Copy link
Contributor Author

@sierra-moxon are you taking care of assigning the PR or are you expecting me to pick someone?

@sierra-moxon sierra-moxon merged commit 687fc53 into linkml:main Oct 16, 2023
8 checks passed
@sierra-moxon
Copy link
Member

Thanks @Silvanoc! I reviewed and merged; this is terrific! please feel free to assign one of us as a reviewer if needed. :). I also upped your permissions on the repo so you should be able to branch instead of fork if that is easier.

@Silvanoc
Copy link
Contributor Author

feel free to assign one of us

I'm not sure I know exactly who do you mean by "us". I probably have a rough idea, but I cannot find any place stating something like "this is the LinkML" team or "these are the maintainers".

@cthoyt
Copy link
Contributor

cthoyt commented Oct 16, 2023

@Silvanoc I'm glad you found a place for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

URI schemes being rejected as IDs
3 participants