-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Explanation of RegEx and Reason for AASd-130 #381
base: IDTA-01001-3-1_working
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comments and suggestions.
documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc
Outdated
Show resolved
Hide resolved
documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc
Outdated
Show resolved
Hide resolved
The string can include common characters like tabs, newlines, carriage returns, and spaces. | ||
It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding. | ||
It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string can include common characters like tabs, newlines, carriage returns, and spaces. | |
It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding. | |
It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think someone will need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decision was to support what XML Schema 1.0 is supporting. Marko suggest to further restrict it, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think that removing these three lines would restrict anything further as they are only explanatory for the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.
Remembering our discussion, we may can reformulate this one a bit to make it more specific:
"It assumes that the entire string adheres to the rules of UTF-16 encoding, which is the current standard way of representing a wide range of characters from different languages."
As far as I got the context, a UTF-32-enabled application would represent a file slightly different, no surrogate pairs needed, and therefore the regex pattern representing this constraint would need to look differently for it. But the whole UTF-16 vs. UTF-32 separation does not affect the constraint itself but it's representation in the schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the whole UTF-16 vs. UTF-32 separation does not affect the constraint itself but it's representation in the schemas.
So how about we replace the above sentence with something like a design decision: "For the current versions of the specification, this constraint is represented as a regex pattern expecting UTF-16 compliant applications"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for the current versions"? what does this mean? It is not clear to me what we really request and expect (in the future and today).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try another formulation:
"Note: The constraint AASd-130 is represented as a regex pattern expecting UTF-16 compliant applications. It might be necessary to adjust this pattern for UTF-32 compliant applications in future versions of this specification."
documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc
Outdated
Show resolved
Hide resolved
documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc
Show resolved
Hide resolved
Co-authored-by: Marko Ristin <[email protected]>
Co-authored-by: Marko Ristin <[email protected]>
Co-authored-by: Marko Ristin <[email protected]>
Co-authored-by: Marko Ristin <[email protected]>
is it V3.0.1 or V3.1: what is the related issue? |
[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]: Defines a character class that allows various Unicode characters. | ||
|
||
\x09: ASCII horizontal tab. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following list seems to be redundant to the list above? Remove one of them?
@sebbader-sap @sebbader Can you please review this issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the lists are not properly rendered in the Github preview. Could be that the antora magic kicks in and solves it, not sure.
documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc
Outdated
Show resolved
Hide resolved
The string can include common characters like tabs, newlines, carriage returns, and spaces. | ||
It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding. | ||
It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think that removing these three lines would restrict anything further as they are only explanatory for the reader.
The string can include common characters like tabs, newlines, carriage returns, and spaces. | ||
It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding. | ||
It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.
Remembering our discussion, we may can reformulate this one a bit to make it more specific:
"It assumes that the entire string adheres to the rules of UTF-16 encoding, which is the current standard way of representing a wide range of characters from different languages."
As far as I got the context, a UTF-32-enabled application would represent a file slightly different, no surrogate pairs needed, and therefore the regex pattern representing this constraint would need to look differently for it. But the whole UTF-16 vs. UTF-32 separation does not affect the constraint itself but it's representation in the schemas.
The string can include common characters like tabs, newlines, carriage returns, and spaces. | ||
It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding. | ||
It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the whole UTF-16 vs. UTF-32 separation does not affect the constraint itself but it's representation in the schemas.
So how about we replace the above sentence with something like a design decision: "For the current versions of the specification, this constraint is represented as a regex pattern expecting UTF-16 compliant applications"?
|
||
This leads to the following regular expression: | ||
^[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]$ | |
^[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]*$ |
documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebbader-sap : https://github.com/admin-shell-io/aas-specs/pull/381/files#r1559508365 this is still open, the rest seems fine to me
The problem is that we can "serialise" AASd-130 into different regex patterns due to the fact that regex itself is underspecified. I tried to explain our decision for the UTF-16 representation a bit better, and that this representation might change when UTF-32-enabled regex libraries become more common. |
@g1zzm0 can you please have a look at the suggested changes of Sebastian and the merge? |
Co-authored-by: sebbader-sap <[email protected]>
Co-authored-by: sebbader-sap <[email protected]>
@s-heppner may you please check what we have implemented in the schema? #381 (comment) Thank you |
In the v3.1 schema, the regex from the comment is implemented: aas-core-meta v3.1 L419 Is that as expected? |
This was necessary because there was a broad disagreement in different
committees and persons about what the AASd-130 constraint says, what
the RegEx in AASd-130 means and what the reason for the definition of
the constraint was.