-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put 0 as explicit lower bound in regex patterns #342
Comments
@mhrimaz There is no single regex language, unfortunately. We correctly transpile the patterns in Java: In most regex pattern languages (or better said, dialects?), zero is indeed optional. It all depends on which regex language and which regex engine you use. Can you please elaborate a bit more on the issue? |
Well, if you want to validate the shacl shape(https://github.com/admin-shell-io/aas-specs/blob/9a118059bea5e9cf12405db7c871b5d5869acaa4/schemas/rdf/shacl-schema.ttl#L39), using a java shacl validator like Apache Jena, the issue appears again. So I think having 0 is indeed better—more explicit and probably less troublesome. Does adding zero cause any issues with other engines? |
Not as far as I know. @sebbader-sap @sebbader @s-heppner can you please double-check and let me know how to change the schemas? XSD and JSON Schema probably suffer from this issue in Java as well? @mhrimaz please also note that the Java standard regex library runs in exponential time. For some patterns, you have to use another library. This is fixed in the aas-core-java (and aas-core-cpp). |
I'd be very careful with any Regex changes, they seem to often make the problem worse than better. |
In official repo (https://github.com/admin-shell-io/aas-specs/blob/9a118059bea5e9cf12405db7c871b5d5869acaa4/schemas/json/aas.json#L39) and in code-gen repo (https://github.com/aas-core-works/aas-core-codegen/blob/e1d8c19a08afaa18485c78fe1ff90934daa5decd/test_data/jsonschema/test_main/aas_core_meta.v3/expected_output/schema.json#L39C66-L40C11) language regex have zero in regex qunatifier ( Same in XSD. Apparently, the regexes are not the same in SHACL. |
@mhrimaz do I understand correctly: this issue affects only SHACL? |
@mristin it seems so. SHACL is using different regex patterns, including the utf related patterns (https://github.com/aas-core-works/aas-core-codegen/blob/e1d8c19a08afaa18485c78fe1ff90934daa5decd/test_data/rdf_shacl/test_main/expected/aas_core_meta.v3/expected_output/shacl-schema.ttl#L48). But also in meta (https://github.com/aas-core-works/aas-core-meta/blob/31d6afd348a86f43cc658cc5bffab49f1e47bd24/aas_core_meta/v3_1.py#L286C6-L286C46), 0 is not included, so I don't know if a change is required there or not. |
@mhrimaz thanks for fdigging into this! Aas-core-meta is fine as it is parsed, and Python requires no zeros. I'll fix SHACL then soon, and @s-heppner can put it then on aas-specs repo. |
We included the regex pattern as-is from the input. Instead, with this patch, we parse it from the input and re-render it into the canonical form so that many more regex engines can work with it. For example, in the input, we omit the minimum bound 0 (*e.g.*, ``{,4}``), which breaks with the Java regex engine beneath the SHACL validator. Now, the pattern is correctly rendered with an explicit 0 (``{0,4}``). Discovered in [aas-core-meta issue 342]. [aas-core-meta issue 342]: aas-core-works/aas-core-meta#342
@mhrimaz I made the fix in aas-core-codegen so that we use the canonical regex renderer. Can you please test on your side that the canonical patterns work as expected? Here's the new SHACL: Once you give me a go, I'll merge the pull request in aas-core-codegen, and then we can update aas-specs repository. |
We included the regex pattern as-is from the input. Instead, with this patch, we parse it from the input and re-render it into the form that we also use in Java. We pick Java regex dialect as most SHACL validators in the wild rely on Java as platform, so we decide to serve this user base with priority. For example, in the input meta-model specification, we omit the minimum bound 0 (*e.g.*, ``{,4}``), which breaks with the Java regex engine beneath the SHACL validator. Now, the pattern is correctly rendered with an explicit 0 (``{0,4}``). Discovered in [aas-core-meta issue 342]. [aas-core-meta issue 342]: aas-core-works/aas-core-meta#342
We included the regex pattern as-is from the input. Instead, with this patch, we parse it from the input and re-render it into the form that we also use in Java. We pick Java regex dialect as most SHACL validators in the wild rely on Java as platform, so we decide to serve this user base with priority. For example, in the input meta-model specification, we omit the minimum bound 0 (*e.g.*, ``{,4}``), which breaks with the Java regex engine beneath the SHACL validator. Now, the pattern is correctly rendered with an explicit 0 (``{0,4}``). Discovered in [aas-core-meta issue 342]. [aas-core-meta issue 342]: aas-core-works/aas-core-meta#342
We included the regex pattern as-is from the input which caused problems with the regex engines as the patterns in the meta-model are written in a Python dialect (and assuming that the regex engine works on UTF-32 characters). However, most regex engines in the wild operating on SHACL (*e.g.*, Java SHACL validators) use UTF-16 to represent the text and do not support some parts of the Python regex dialect. For example, in the input meta-model specification, we omit the minimum bound 0 (*e.g.*, ``{,4}``), which breaks with the Java regex engine beneath the SHACL validator. Instead, with this patch, we parse the pattern from the specification and re-render it into the form that we also use in JSON Schema. We pick JSON Schema regex dialect as most SHACL validators in the wild can deal with it, in particular those based on Java as a platform. Hence, we decide to serve this user base with priority. Discovered in [aas-core-meta issue 342]. [aas-core-meta issue 342]: aas-core-works/aas-core-meta#342
We included the regex pattern as-is from the input which caused problems with the regex engines as the patterns in the meta-model are written in a Python dialect (and assuming that the regex engine works on UTF-32 characters). However, most regex engines in the wild operating on SHACL (*e.g.*, Java SHACL validators) use UTF-16 to represent the text and do not support some parts of the Python regex dialect. For example, in the input meta-model specification, we omit the minimum bound 0 (*e.g.*, ``{,4}``), which breaks with the Java regex engine beneath the SHACL validator. Instead, with this patch, we parse the pattern from the specification and re-render it into the form that we also use in JSON Schema. We pick JSON Schema regex dialect as most SHACL validators in the wild can deal with it, in particular those based on Java as a platform. Hence, we decide to serve this user base with priority. Discovered in [aas-core-meta issue 342]. [aas-core-meta issue 342]: aas-core-works/aas-core-meta#342
We included the regex pattern as-is from the input which caused problems with the regex engines as the patterns in the meta-model are written in a Python dialect (and assuming that the regex engine works on UTF-32 characters). However, most regex engines in the wild operating on SHACL (*e.g.*, Java SHACL validators) use UTF-16 to represent the text and do not support some parts of the Python regex dialect. For example, in the input meta-model specification, we omit the minimum bound 0 (*e.g.*, ``{,4}``), which breaks with the Java regex engine beneath the SHACL validator. Instead, with this patch, we parse the pattern from the specification and re-render it into the form that we also use in JSON Schema. We pick JSON Schema regex dialect as most SHACL validators in the wild can deal with it, in particular those based on Java as a platform. Hence, we decide to serve this user base with priority. Discovered in [aas-core-meta issue 342]. Fixed in [aas-core-codegen commit e22cc]. [aas-core-meta issue 342]: aas-core-works/aas-core-meta#342 [aas-core-codegen commit e22cc]: aas-core-works/aas-core-codegen@e22ccae
We included the regex pattern as-is from the input which caused problems with the regex engines as the patterns in the meta-model are written in a Python dialect (and assuming that the regex engine works on UTF-32 characters). However, most regex engines in the wild operating on SHACL (*e.g.*, Java SHACL validators) use UTF-16 to represent the text and do not support some parts of the Python regex dialect. For example, in the input meta-model specification, we omit the minimum bound 0 (*e.g.*, ``{,4}``), which breaks with the Java regex engine beneath the SHACL validator. Instead, with this patch, we parse the pattern from the specification and re-render it into the form that we also use in JSON Schema. We pick JSON Schema regex dialect as most SHACL validators in the wild can deal with it, in particular those based on Java as a platform. Hence, we decide to serve this user base with priority. Discovered in [aas-core-meta issue 342]. Fixed in [aas-core-codegen commit e22cc]. [aas-core-meta issue 342]: aas-core-works/aas-core-meta#342 [aas-core-codegen commit e22cc]: aas-core-works/aas-core-codegen@e22ccae
@s-heppner mentioned in admin-shell-io/aas-specs#426 (comment) that zero is optional in regex quantifiers. I don't think that is true, at least in Java this doesn't work.
This occurs in language tags and probably other places: https://github.com/aas-core-works/aas-core-meta/blob/31d6afd348a86f43cc658cc5bffab49f1e47bd24/aas_core_meta/v3_1.py#L286C6-L286C46
The text was updated successfully, but these errors were encountered: