Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Clarify operations related to merging locale data #804

Merged
merged 3 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 34 additions & 36 deletions spec/locale.html
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ <h1>Intl.Locale ( _tag_ [ , _options_ ] )</h1>
1. Else,
1. Let _tag_ be ? ToString(_tag_).
1. Set _options_ to ? CoerceOptionsToObject(_options_).
1. Set _tag_ to ? ApplyOptionsToTag(_tag_, _options_).
1. If IsStructurallyValidLanguageTag(_tag_) is *false*, throw a *RangeError* exception.
1. Set _tag_ to CanonicalizeUnicodeLocaleId(_tag_).
1. Set _tag_ to ? UpdateLanguageId(_tag_, _options_).
1. Let _opt_ be a new Record.
1. Let _calendar_ be ? GetOption(_options_, *"calendar"*, ~string~, ~empty~, *undefined*).
1. If _calendar_ is not *undefined*, then
Expand All @@ -51,7 +53,7 @@ <h1>Intl.Locale ( _tag_ [ , _options_ ] )</h1>
1. If _numberingSystem_ is not *undefined*, then
1. If _numberingSystem_ cannot be matched by the <code>type</code> Unicode locale nonterminal, throw a *RangeError* exception.
1. Set _opt_.[[nu]] to _numberingSystem_.
1. Let _r_ be ApplyUnicodeExtensionToTag(_tag_, _opt_, _relevantExtensionKeys_).
1. Let _r_ be MakeLocaleRecord(_tag_, _opt_, _relevantExtensionKeys_).
1. Set _locale_.[[Locale]] to _r_.[[locale]].
1. Set _locale_.[[Calendar]] to _r_.[[ca]].
1. Set _locale_.[[Collation]] to _r_.[[co]].
Expand All @@ -68,54 +70,50 @@ <h1>Intl.Locale ( _tag_ [ , _options_ ] )</h1>
</emu-alg>
</emu-clause>

<emu-clause id="sec-apply-options-to-tag" type="abstract operation">
<emu-clause id="sec-updatelanguageid" type="abstract operation" oldids="sec-apply-options-to-tag">
<h1>
ApplyOptionsToTag (
_tag_: a String,
UpdateLanguageId (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of getting LanguageId into the name of this AO, given that the unicode_language_id is what it works with

_tag_: a Unicode canonicalized locale identifier,
_options_: an Object,
): either a normal completion containing a Unicode canonicalized locale identifier or a throw completion
): either a normal completion containing a language tag or a throw completion
</h1>
<dl class="header">
<dt>description</dt>
<dd>It updates the <code>unicode_language_id</code> subtags in _tag_ from the corresponding properties of _options_ and returns the <emu-xref href="#sec-isstructurallyvalidlanguagetag">structurally valid</emu-xref> but non-canonicalized result.</dd>
</dl>
<emu-alg>
1. If IsStructurallyValidLanguageTag(_tag_) is *false*, throw a *RangeError* exception.
1. Let _language_ be ? GetOption(_options_, *"language"*, ~string~, ~empty~, *undefined*).
1. If _language_ is not *undefined*, then
1. If _language_ cannot be matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal, throw a *RangeError* exception.
1. Let _script_ be ? GetOption(_options_, *"script"*, ~string~, ~empty~, *undefined*).
1. Let _languageId_ be the longest prefix of _tag_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. Let _language_ be ? GetOption(_options_, *"language"*, ~string~, ~empty~, GetLocaleLanguage(_languageId_)).
1. If _language_ cannot be matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal, throw a *RangeError* exception.
1. Let _script_ be ? GetOption(_options_, *"script"*, ~string~, ~empty~, GetLocaleScript(_languageId_)).
1. If _script_ is not *undefined*, then
1. If _script_ cannot be matched by the <code>unicode_script_subtag</code> Unicode locale nonterminal, throw a *RangeError* exception.
1. Let _region_ be ? GetOption(_options_, *"region"*, ~string~, ~empty~, *undefined*).
1. Let _region_ be ? GetOption(_options_, *"region"*, ~string~, ~empty~, GetLocaleRegion(_languageId_)).
1. If _region_ is not *undefined*, then
1. If _region_ cannot be matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal, throw a *RangeError* exception.
1. Set _tag_ to CanonicalizeUnicodeLocaleId(_tag_).
ryzokuken marked this conversation as resolved.
Show resolved Hide resolved
1. Assert: _tag_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. Let _languageId_ be the longest prefix of _tag_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. If _language_ is *undefined*, set _language_ to GetLocaleLanguage(_languageId_).
1. If _script_ is *undefined*, set _script_ to GetLocaleScript(_languageId_).
1. If _region_ is *undefined*, set _region_ to GetLocaleRegion(_languageId_).
1. Let _variants_ be GetLocaleVariants(_languageId_).
1. Set _languageId_ to _language_.
1. If _script_ is not *undefined*, set _languageId_ to the string-concatenation of _languageId_, *"-"*, and _script_.
1. If _region_ is not *undefined*, set _languageId_ to the string-concatenation of _languageId_, *"-"*, and _region_.
1. If _variants_ is not *undefined*, set _languageId_ to the string-concatenation of _languageId_, *"-"*, and _variants_.
1. Set _tag_ to _tag_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_language_id</code> Unicode locale nonterminal replaced by the string _languageId_.
1. Return CanonicalizeUnicodeLocaleId(_tag_).
1. Let _newLanguageId_ be _language_.
1. If _script_ is not *undefined*, set _newLanguageId_ to the string-concatenation of _newLanguageId_, *"-"*, and _script_.
1. If _region_ is not *undefined*, set _newLanguageId_ to the string-concatenation of _newLanguageId_, *"-"*, and _region_.
1. If _variants_ is not *undefined*, set _newLanguageId_ to the string-concatenation of _newLanguageId_, *"-"*, and _variants_.
1. Let _newTag_ be _tag_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_language_id</code> Unicode locale nonterminal replaced by the string _newLanguageId_.
1. Return _newTag_.
</emu-alg>
</emu-clause>

<emu-clause id="sec-apply-unicode-extension-to-tag" type="abstract operation">
<emu-clause id="sec-makelocalerecord" type="abstract operation" oldids="sec-apply-unicode-extension-to-tag">
<h1>
ApplyUnicodeExtensionToTag (
_tag_: a String,
MakeLocaleRecord (
_tag_: a language tag,
_options_: a Record,
_relevantExtensionKeys_: a List of Strings,
): a Record
</h1>
<dl class="header">
<dt>description</dt>
<dd>It constructs and returns a Record in which each element of _relevantExtensionKeys_ defines a corresponding field with data from any Unicode locale extension sequence of _tag_ as overridden by a corresponding field of _options_, and which additionally includes a [[locale]] field containing a Unicode canonicalized locale identifier resulting from incorporating those fields into _tag_.</dd>
</dl>
<emu-alg>
1. Assert: _tag_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. If _tag_ contains a <emu-not-ref>substring</emu-not-ref> that is a Unicode locale extension sequence, then
1. Let _extension_ be the String value consisting of the <emu-not-ref>substring</emu-not-ref> of the Unicode locale extension sequence within _tag_.
1. Let _components_ be UnicodeExtensionComponents(_extension_).
Expand All @@ -126,12 +124,12 @@ <h1>
1. Let _keywords_ be a new empty List.
1. Let _result_ be a new Record.
1. For each element _key_ of _relevantExtensionKeys_, do
1. Let _value_ be *undefined*.
1. If _keywords_ contains an element whose [[Key]] is the same as _key_, then
1. Let _entry_ be the element of _keywords_ whose [[Key]] is the same as _key_.
1. Set _value_ to _entry_.[[Value]].
1. If _keywords_ contains an element whose [[Key]] is _key_, then
1. Let _entry_ be the element of _keywords_ whose [[Key]] is _key_.
1. Let _value_ be _entry_.[[Value]].
1. Else,
1. Let _entry_ be ~empty~.
1. Let _value_ be *undefined*.
1. Assert: _options_ has a field [[&lt;_key_&gt;]].
1. Let _optionsValue_ be _options_.[[&lt;_key_&gt;]].
1. If _optionsValue_ is not *undefined*, then
Expand All @@ -143,10 +141,10 @@ <h1>
1. Append the Record { [[Key]]: _key_, [[Value]]: _value_ } to _keywords_.
1. Set _result_.[[&lt;_key_&gt;]] to _value_.
1. Let _locale_ be the String value that is _tag_ with any Unicode locale extension sequences removed.
1. Let _newExtension_ be a Unicode BCP 47 U Extension based on _attributes_ and _keywords_.
1. If _newExtension_ is not the empty String, then
1. Set _locale_ to InsertUnicodeExtensionAndCanonicalize(_locale_, _newExtension_).
1. Set _result_.[[locale]] to _locale_.
1. If _attributes_ is not empty or _keywords_ is not empty, then
1. Set _result_.[[locale]] to InsertUnicodeExtensionAndCanonicalize(_locale_, _attributes_, _keywords_).
1. Else,
1. Set _result_.[[locale]] to CanonicalizeUnicodeLocaleId(_locale_).
1. Return _result_.
</emu-alg>
</emu-clause>
Expand Down
6 changes: 3 additions & 3 deletions spec/locales-currencies-tz.html
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,12 @@ <h1>
</h1>
<dl class="header">
<dt>description</dt>
<dd>It returns the canonical and case-regularized form of the _locale_.</dd>
<dd>It returns the canonical and case-regularized form of _locale_.</dd>
</dl>
<emu-alg>
1. Let _localeId_ be the String value resulting from performing the algorithm to transform _locale_ to canonical form per <a href="https://unicode.org/reports/tr35/#LocaleId_Canonicalization">Unicode Technical Standard #35 Part 1 Core, Annex C LocaleId Canonicalization</a> (note that the algorithm begins with canonicalizing syntax only).
1. [id="step-canonicalizeunicodelocaleid-u-extension"] If _localeId_ contains a substring that is a Unicode locale extension sequence, then
1. Let _extension_ be the String value consisting of the substring of the Unicode locale extension sequence within _localeId_.
1. [id="step-canonicalizeunicodelocaleid-u-extension"] If _localeId_ contains a <emu-not-ref>substring</emu-not-ref> that is a Unicode locale extension sequence, then
1. Let _extension_ be the String value consisting of the <emu-not-ref>substring</emu-not-ref> of the Unicode locale extension sequence within _localeId_.
1. Let _newExtension_ be *"-u"*.
1. Let _components_ be UnicodeExtensionComponents(_extension_).
1. For each element _attr_ of _components_.[[Attributes]], do
Expand Down
53 changes: 31 additions & 22 deletions spec/negotiation.html
Original file line number Diff line number Diff line change
Expand Up @@ -170,16 +170,24 @@ <h1>
<emu-clause id="sec-insert-unicode-extension-and-canonicalize" type="abstract operation">
<h1>
InsertUnicodeExtensionAndCanonicalize (
_locale_: a Unicode canonicalized locale identifier,
_extension_: a Unicode locale extension sequence,
_locale_: a language tag,
_attributes_: a List of Strings,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I'm wondering about is what this gets us over the old version -- is it just so that InsertUnicodeExtensionAndCanonicalize can do in a more explicit way what this line from the old ApplyUnicodeExtensionsToLocale did?

8. Let newExtension be a Unicode BCP 47 U Extension based on attributes and keywords.

If all the changes in ResolveLocale were just made to account for the change in the parameters to InsertUnicodeExtensionAndCanonicalize I feel like all else being equal leaving InsertUnicodeExtensionAndCanonicalize unchanged is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I'm wondering about is what this gets us over the old version -- is it just so that InsertUnicodeExtensionAndCanonicalize can do in a more explicit way what this line from the old ApplyUnicodeExtensionsToLocale did?

8. Let newExtension be a Unicode BCP 47 U Extension based on attributes and keywords.

That is a goal, but so is isolating construction of Unicode locale extension sequences to one place—and both calling algorithms already have a list of attributes (trivially in the case of ResolveLocale) and handle keywords one by one. I also foresee similar convergence between CanonicalizeUnicodeLocaleId and InsertUnicodeExtensionAndCanonicalize, although I'm not yet taking that step here.

If all the changes in ResolveLocale were just made to account for the change in the parameters to InsertUnicodeExtensionAndCanonicalize I feel like all else being equal leaving InsertUnicodeExtensionAndCanonicalize unchanged is better.

No, I do also find that replacing iterative string concatenation in ResolveLocale with use of keyword Records increases its comprehensibility. For example, compare:

-1. Let _supportedExtensionAddition_ be the string-concatenation of *"-"*, _key_, *"-"*, and _value_.
+1. Set _supportedKeyword_ to the Record { [[Key]]: _key_, [[Value]]: _value_ }.

_keywords_: a List of Records,
): a Unicode canonicalized locale identifier
</h1>
<dl class="header">
<dt>description</dt>
<dd>It incorporates _extension_ into _locale_ and returns the canonicalized result.</dd>
<dd>It incorporates _attributes_ and _keywords_ into _locale_ as a Unicode locale extension sequence and returns the canonicalized result.</dd>
</dl>
<emu-alg>
1. Assert: _locale_ does not contain a Unicode locale extension sequence.
1. Let _extension_ be *"-u"*.
1. For each element _attr_ of _attributes_, do
1. Set _extension_ to the string-concatenation of _extension_, *"-"*, and _attr_.
1. For each Record { [[Key]], [[Value]] } _keyword_ of _keywords_, do
1. Set _extension_ to the string-concatenation of _extension_, *"-"*, and _keyword_.[[Key]].
1. If _keyword_.[[Value]] is not the empty String, set _extension_ to the string-concatenation of _extension_, *"-"*, and _keyword_.[[Value]].
1. If _extension_ is *"-u"*, return CanonicalizeUnicodeLocaleId(_locale_).
1. Let _privateIndex_ be StringIndexOf(_locale_, *"-x-"*, 0).
1. If _privateIndex_ is ~not-found~, then
1. Let _newLocale_ be the string-concatenation of _locale_ and _extension_.
Expand Down Expand Up @@ -221,24 +229,25 @@ <h1>
1. If _r_.[[extension]] is not ~empty~, then
1. Let _components_ be UnicodeExtensionComponents(_r_.[[extension]]).
1. Let _keywords_ be _components_.[[Keywords]].
1. Let _supportedExtension_ be *"-u"*.
1. Else,
1. Let _keywords_ be a new empty List.
1. Let _supportedKeywords_ be a new empty List.
1. For each element _key_ of _relevantExtensionKeys_, do
1. Let _keyLocaleData_ be _foundLocaleData_.[[&lt;_key_&gt;]].
1. Assert: _keyLocaleData_ is a List.
1. Let _value_ be _keyLocaleData_[0].
1. Assert: _value_ is a String or _value_ is *null*.
1. Let _supportedExtensionAddition_ be *""*.
1. If _r_.[[extension]] is not ~empty~, then
1. If _keywords_ contains an element whose [[Key]] is the same as _key_, then
1. Let _entry_ be the element of _keywords_ whose [[Key]] is the same as _key_.
1. Let _requestedValue_ be _entry_.[[Value]].
1. If _requestedValue_ is not the empty String, then
1. If _keyLocaleData_ contains _requestedValue_, then
1. Set _value_ to _requestedValue_.
1. Set _supportedExtensionAddition_ to the string-concatenation of *"-"*, _key_, *"-"*, and _value_.
1. Else if _keyLocaleData_ contains *"true"*, then
1. Set _value_ to *"true"*.
1. Set _supportedExtensionAddition_ to the string-concatenation of *"-"* and _key_.
1. Let _supportedKeyword_ be ~empty~.
1. If _keywords_ contains an element whose [[Key]] is _key_, then
1. Let _entry_ be the element of _keywords_ whose [[Key]] is _key_.
1. Let _requestedValue_ be _entry_.[[Value]].
1. If _requestedValue_ is not the empty String, then
1. If _keyLocaleData_ contains _requestedValue_, then
1. Set _value_ to _requestedValue_.
1. Set _supportedKeyword_ to the Record { [[Key]]: _key_, [[Value]]: _value_ }.
1. Else if _keyLocaleData_ contains *"true"*, then
1. Set _value_ to *"true"*.
1. Set _supportedKeyword_ to the Record { [[Key]]: _key_, [[Value]]: *""* }.
1. Assert: _options_ has a field [[&lt;_key_&gt;]].
1. Let _optionsValue_ be _options_.[[&lt;_key_&gt;]].
1. Assert: _optionsValue_ is a String, or _optionsValue_ is either *undefined* or *null*.
Expand All @@ -251,13 +260,13 @@ <h1>
1. Set _optionsValue_ to *"true"*.
1. If SameValue(_optionsValue_, _value_) is *false* and _keyLocaleData_ contains _optionsValue_, then
1. Set _value_ to _optionsValue_.
1. Set _supportedExtensionAddition_ to *""*.
1. Set _supportedKeyword_ to ~empty~.
1. If _supportedKeyword_ is not ~empty~, append _supportedKeyword_ to _supportedKeywords_.
1. Set _result_.[[&lt;_key_&gt;]] to _value_.
1. Set _supportedExtension_ to the string-concatenation of _supportedExtension_ and _supportedExtensionAddition_.
1. If _supportedExtension_ is *"-u"*, then
1. Set _result_.[[Locale]] to _foundLocale_.
1. Else,
1. Set _result_.[[Locale]] to InsertUnicodeExtensionAndCanonicalize(_foundLocale_, _supportedExtension_).
1. If _supportedKeywords_ is not empty, then
1. Let _supportedAttributes_ be a new empty List.
1. Set _foundLocale_ to InsertUnicodeExtensionAndCanonicalize(_foundLocale_, _supportedAttributes_, _supportedKeywords_).
1. Set _result_.[[locale]] to _foundLocale_.
1. Return _result_.
</emu-alg>
</emu-clause>
Expand Down
Loading