Add Explanation of RegEx and Reason for AASd-130 #381

g1zzm0 · 2024-03-08T14:34:53Z

This was necessary because there was a broad disagreement in different
committees and persons about what the AASd-130 constraint says, what
the RegEx in AASd-130 means and what the reason for the definition of
the constraint was.

mristin

Please see the comments and suggestions.

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

mristin · 2024-03-08T14:43:59Z

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

+The string can include common characters like tabs, newlines, carriage returns, and spaces.
+It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding.
+It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.


Suggested change

The string can include common characters like tabs, newlines, carriage returns, and spaces.

It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding.

It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.

I think someone will need this

Decision was to support what XML Schema 1.0 is supporting. Marko suggest to further restrict it, correct?

No, I don't think that removing these three lines would restrict anything further as they are only explanatory for the reader.

It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.

Remembering our discussion, we may can reformulate this one a bit to make it more specific:
"It assumes that the entire string adheres to the rules of UTF-16 encoding, which is the current standard way of representing a wide range of characters from different languages."

As far as I got the context, a UTF-32-enabled application would represent a file slightly different, no surrogate pairs needed, and therefore the regex pattern representing this constraint would need to look differently for it. But the whole UTF-16 vs. UTF-32 separation does not affect the constraint itself but it's representation in the schemas.

But the whole UTF-16 vs. UTF-32 separation does not affect the constraint itself but it's representation in the schemas.

So how about we replace the above sentence with something like a design decision: "For the current versions of the specification, this constraint is represented as a regex pattern expecting UTF-16 compliant applications"?

"for the current versions"? what does this mean? It is not clear to me what we really request and expect (in the future and today).

Let me try another formulation:

"Note: The constraint AASd-130 is represented as a regex pattern expecting UTF-16 compliant applications. It might be necessary to adjust this pattern for UTF-32 compliant applications in future versions of this specification."

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

Co-authored-by: Marko Ristin <[email protected]>

BirgitBoss · 2024-03-27T14:14:27Z

is it V3.0.1 or V3.1: what is the related issue?

BirgitBoss · 2024-03-27T14:40:37Z

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

+[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]: Defines a character class that allows various Unicode characters.
+
+\x09: ASCII horizontal tab.


The following list seems to be redundant to the list above? Remove one of them?

JoergNeidig · 2024-04-10T13:48:22Z

@sebbader-sap @sebbader Can you please review this issue?

sebbader-sap

Note that the lists are not properly rendered in the Github preview. Could be that the antora magic kicks in and solves it, not sure.

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

sebbader-sap · 2024-04-10T13:55:58Z

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

+The string can include common characters like tabs, newlines, carriage returns, and spaces.
+It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding.
+It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.


No, I don't think that removing these three lines would restrict anything further as they are only explanatory for the reader.

sebbader-sap · 2024-04-10T14:03:35Z

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

+The string can include common characters like tabs, newlines, carriage returns, and spaces.
+It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding.
+It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.


It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.

Remembering our discussion, we may can reformulate this one a bit to make it more specific:
"It assumes that the entire string adheres to the rules of UTF-16 encoding, which is the current standard way of representing a wide range of characters from different languages."

As far as I got the context, a UTF-32-enabled application would represent a file slightly different, no surrogate pairs needed, and therefore the regex pattern representing this constraint would need to look differently for it. But the whole UTF-16 vs. UTF-32 separation does not affect the constraint itself but it's representation in the schemas.

sebbader-sap · 2024-04-10T14:05:00Z

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

+The string can include common characters like tabs, newlines, carriage returns, and spaces.
+It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding.
+It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.


But the whole UTF-16 vs. UTF-32 separation does not affect the constraint itself but it's representation in the schemas.

So how about we replace the above sentence with something like a design decision: "For the current versions of the specification, this constraint is represented as a regex pattern expecting UTF-16 compliant applications"?

sebbader-sap · 2024-04-10T14:07:26Z

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

+
+This leads to the following regular expression:
+^[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]$


Suggested change

^[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]$

^[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]*$

documentation/IDTA-01001/modules/ROOT/pages/Spec/IDTA-01001_Metamodel_Constraints.adoc

BirgitBoss

@sebbader-sap : https://github.com/admin-shell-io/aas-specs/pull/381/files#r1559508365 this is still open, the rest seems fine to me

sebbader-sap · 2024-05-29T10:57:42Z

The problem is that we can "serialise" AASd-130 into different regex patterns due to the fact that regex itself is underspecified. I tried to explain our decision for the UTF-16 representation a bit better, and that this representation might change when UTF-32-enabled regex libraries become more common.

BirgitBoss · 2024-06-14T16:26:44Z

@g1zzm0 can you please have a look at the suggested changes of Sebastian and the merge?

Co-authored-by: sebbader-sap <[email protected]>

BirgitBoss · 2024-11-27T14:50:51Z

@s-heppner may you please check what we have implemented in the schema? #381 (comment) Thank you

s-heppner · 2024-11-28T11:37:36Z

In the v3.1 schema, the regex from the comment is implemented: aas-core-meta v3.1 L419

Is that as expected?

Add Explenation of RegEx and Reson for AASd 130 in natural language

fe5b595

g1zzm0 requested review from mristin and sebbader-sap March 8, 2024 14:34

g1zzm0 added documentation Improvements or additions to documentation requires workstream approval strategic decision in spec team needed labels Mar 8, 2024

mristin approved these changes Mar 8, 2024

View reviewed changes

g1zzm0 and others added 5 commits March 8, 2024 15:59

Short Definiton

08bf571

Co-authored-by: Marko Ristin <[email protected]>

Reformation

99f3078

Co-authored-by: Marko Ristin <[email protected]>

Remove Whitespace

a114a6c

Co-authored-by: Marko Ristin <[email protected]>

Spellcheck and grammar improvement

5cb5ff3

Co-authored-by: Marko Ristin <[email protected]>

Update IDTA-01001_Metamodel_Constraints.adoc

9cea09a

BirgitBoss added this to the V3.0.1 milestone Mar 27, 2024

BirgitBoss reviewed Mar 27, 2024

View reviewed changes

BirgitBoss mentioned this pull request Mar 27, 2024

No UTF32 Characters in the Regex for Strings #362

Closed

g1zzm0 requested a review from BirgitBoss April 10, 2024 13:48

sebbader-sap requested changes Apr 10, 2024

View reviewed changes

BirgitBoss modified the milestones: V3.0.1, V3.1 Apr 12, 2024

BirgitBoss requested changes May 29, 2024

View reviewed changes

BirgitBoss and others added 2 commits November 27, 2024 15:47

formulation

1617b53

Co-authored-by: sebbader-sap <[email protected]>

formulation

1a3037f

Co-authored-by: sebbader-sap <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Explanation of RegEx and Reason for AASd-130 #381

Add Explanation of RegEx and Reason for AASd-130 #381

g1zzm0 commented Mar 8, 2024

mristin left a comment

mristin Mar 8, 2024

g1zzm0 Mar 8, 2024 •

edited

Loading

BirgitBoss Mar 27, 2024

sebbader-sap Apr 10, 2024

sebbader-sap Apr 10, 2024

sebbader-sap Apr 10, 2024

BirgitBoss Apr 12, 2024

sebbader-sap May 29, 2024

BirgitBoss commented Mar 27, 2024

BirgitBoss Mar 27, 2024

JoergNeidig commented Apr 10, 2024

sebbader-sap left a comment

sebbader-sap Apr 10, 2024

sebbader-sap Apr 10, 2024

sebbader-sap Apr 10, 2024

sebbader-sap Apr 10, 2024

BirgitBoss left a comment

sebbader-sap commented May 29, 2024

BirgitBoss commented Jun 14, 2024

BirgitBoss commented Nov 27, 2024

s-heppner commented Nov 28, 2024

	The string can include common characters like tabs, newlines, carriage returns, and spaces.
	It allows a broad range of Unicode characters, including those beyond the Basic Multilingual Plane (BMP) which are represented using surrogate pairs in UTF-16 encoding.
	It ensures that the entire string adheres to the rules of UTF-16 encoding, which is a standard way of representing a wide range of characters from different languages.

		[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]: Defines a character class that allows various Unicode characters.

		\x09: ASCII horizontal tab.


		This leads to the following regular expression:
		^[\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u00010000-\u0010FFFF]$

Add Explanation of RegEx and Reason for AASd-130 #381

Are you sure you want to change the base?

Add Explanation of RegEx and Reason for AASd-130 #381

Conversation

g1zzm0 commented Mar 8, 2024

mristin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

g1zzm0 Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BirgitBoss commented Mar 27, 2024

Choose a reason for hiding this comment

JoergNeidig commented Apr 10, 2024

sebbader-sap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BirgitBoss left a comment

Choose a reason for hiding this comment

sebbader-sap commented May 29, 2024

BirgitBoss commented Jun 14, 2024

BirgitBoss commented Nov 27, 2024

s-heppner commented Nov 28, 2024

g1zzm0 Mar 8, 2024 •

edited

Loading