Editorial: Clarify operations related to merging locale data #804

gibson042 · 2023-07-11T02:48:15Z

Rename ApplyOptionsToTag to UpdateLanguageId, move its locale id syntax validation out to the (sole) call site, and handle each overriding datum in its own single block.
Rename ApplyUnicodeExtensionToTag to MergeLocaleData.
Refactor InsertUnicodeExtensionAndCanonicalize(locale, extension) into InsertUnicodeExtensionAndCanonicalize(locale, attributes, keywords) (constructing the new -u- extension inline).
Fix a spec bug in CanonicalizeUnicodeLocaleId.

ben-allen

This is excellent -- I've got some questions about the refactored InsertUnicodeExtensionAndCanonicalize, but that may be because I'm missing what's going on there.

ben-allen · 2023-08-16T06:47:22Z

spec/locales-currencies-tz.html

          1. Let _components_ be UnicodeExtensionComponents(_extension_).
          1. For each element _attr_ of _components_.[[Attributes]], do
            1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _attr_.
          1. For each Record { [[Key]], [[Value]] } _keyword_ in _components_.[[Keywords]], do
            1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _keyword_.[[Key]].
            1. If _keyword_.[[Value]] is not the empty String, then
              1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _keyword_.[[Value]].
-          1. Assert: _newExtension_ is not equal to *"u"*.
+          1. Assert: _newExtension_ is not equal to *"-u"*.


dang, nice catch!

ben-allen · 2023-08-16T06:53:36Z

spec/locale.html

        1. Let _language_ be ? GetOption(_options_, *"language"*, ~string~, ~empty~, *undefined*).
        1. If _language_ is not *undefined*, then
          1. If _language_ cannot be matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal, throw a *RangeError* exception.
+          1. Set _languageId_ to _languageId_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal replaced with _language_.


this set of changes greatly improves readability. thank you!

ben-allen · 2023-08-16T07:01:23Z

spec/locale.html

-
+    <emu-clause id="sec-updatelanguageid" type="abstract operation" oldids="sec-apply-options-to-tag">
+      <h1>
+        UpdateLanguageId (


I'm a big fan of getting LanguageId into the name of this AO, given that the unicode_language_id is what it works with

ben-allen · 2023-08-16T07:08:53Z

spec/locale.html

@@ -51,7 +52,7 @@ <h1>Intl.Locale ( _tag_ [ , _options_ ] )</h1>
        1. If _numberingSystem_ is not *undefined*, then
          1. If _numberingSystem_ cannot be matched by the <code>type</code> Unicode locale nonterminal, throw a *RangeError* exception.
        1. Set _opt_.[[nu]] to _numberingSystem_.
-        1. Let _r_ be ! ApplyUnicodeExtensionToTag(_tag_, _opt_, _relevantExtensionKeys_).
+        1. Let _r_ be MergeLocaleData(_tag_, _opt_, _relevantExtensionKeys_).


I feel like I don't want to like this name, but that I nevertheless like it because it's such a big improvement over the old name and because I can't think of anything that seems better.

ben-allen · 2023-08-16T07:16:06Z

spec/negotiation.html

-          _locale_: a Unicode canonicalized locale identifier,
-          _extension_: a Unicode locale extension sequence,
+          _locale_: a language tag,
+          _attributes_: a List of Strings,


The thing I'm wondering about is what this gets us over the old version -- is it just so that InsertUnicodeExtensionAndCanonicalize can do in a more explicit way what this line from the old ApplyUnicodeExtensionsToLocale did?

8. Let newExtension be a Unicode BCP 47 U Extension based on attributes and keywords.

If all the changes in ResolveLocale were just made to account for the change in the parameters to InsertUnicodeExtensionAndCanonicalize I feel like all else being equal leaving InsertUnicodeExtensionAndCanonicalize unchanged is better.

The thing I'm wondering about is what this gets us over the old version -- is it just so that InsertUnicodeExtensionAndCanonicalize can do in a more explicit way what this line from the old ApplyUnicodeExtensionsToLocale did?

8. Let newExtension be a Unicode BCP 47 U Extension based on attributes and keywords.

That is a goal, but so is isolating construction of Unicode locale extension sequences to one place—and both calling algorithms already have a list of attributes (trivially in the case of ResolveLocale) and handle keywords one by one. I also foresee similar convergence between CanonicalizeUnicodeLocaleId and InsertUnicodeExtensionAndCanonicalize, although I'm not yet taking that step here.

If all the changes in ResolveLocale were just made to account for the change in the parameters to InsertUnicodeExtensionAndCanonicalize I feel like all else being equal leaving InsertUnicodeExtensionAndCanonicalize unchanged is better.

No, I do also find that replacing iterative string concatenation in ResolveLocale with use of keyword Records increases its comprehensibility. For example, compare:

-1. Let _supportedExtensionAddition_ be the string-concatenation of *"-"*, _key_, *"-"*, and _value_. +1. Set _supportedKeyword_ to the Record { [[Key]]: _key_, [[Value]]: _value_ }.

ben-allen · 2023-08-16T07:20:18Z

spec/locale.html

-        1. Return CanonicalizeUnicodeLocaleId(_tag_).
+            1. Set _languageId_ to _languageId_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal replaced with _region_.
+        1. Set _tag_ to _tag_ with the substring matched by the <code>unicode_language_id</code> Unicode locale nonterminal replaced with _languageId_.
+        1. Return _tag_.


Making sure I'm getting this one right: it's not necessary to canonicalize here, because the new version of MergeLocaleData is guaranteed to canonicalize it anyway.

spec/locale.html

sffc · 2023-08-23T22:46:12Z

For possible inspiration: unicode-org/icu4x#1833

sffc · 2023-08-23T22:46:48Z

@gibson042 Is this ready for @anba to re-review?

gibson042 · 2023-08-24T22:14:37Z

Yes.

sffc · 2024-05-02T22:29:08Z

I suggest @ben-allen to verify that @gibson042's latest changes address @anba's comments.

ryzokuken

LGTM overall, confirmed that @anba's two comments were appropriately addressed, marking them as resolved.

ryzokuken · 2024-05-23T16:37:12Z

@gibson042 please rebase this whenever you could but this looks ready to merge.

…ading "LocaleData"

gibson042 requested a review from ryzokuken July 11, 2023 02:48

ben-allen reviewed Aug 16, 2023

View reviewed changes

anba requested changes Aug 16, 2023

View reviewed changes

spec/locale.html Show resolved Hide resolved

spec/locale.html Outdated Show resolved Hide resolved

gibson042 requested a review from anba August 24, 2023 22:14

ryzokuken added editorial Involves an editorial fix needs review labels Feb 22, 2024

ryzokuken approved these changes May 23, 2024

View reviewed changes

gibson042 added 3 commits May 29, 2024 13:38

Editorial: Clarify operations related to merging locale data

6efc2aa

Editorial: Rename MergeLocaleData to MakeLocaleRecord to avoid overlo…

a1e0756

…ading "LocaleData"

Editorial: Clean up MakeLocaleRecord

0de405d

gibson042 force-pushed the 2023-07-merge-locale-data branch from 31768c1 to 0de405d Compare May 29, 2024 17:40

gibson042 merged commit 53c382c into tc39:main May 29, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editorial: Clarify operations related to merging locale data #804

Editorial: Clarify operations related to merging locale data #804

gibson042 commented Jul 11, 2023

ben-allen left a comment

ben-allen Aug 16, 2023

ben-allen Aug 16, 2023

ben-allen Aug 16, 2023

ben-allen Aug 16, 2023

ben-allen Aug 16, 2023

gibson042 Aug 17, 2023

ben-allen Aug 16, 2023

gibson042 Aug 17, 2023

sffc commented Aug 23, 2023

sffc commented Aug 23, 2023

gibson042 commented Aug 24, 2023

sffc commented May 2, 2024

ryzokuken left a comment

ryzokuken commented May 23, 2024

Editorial: Clarify operations related to merging locale data #804

Editorial: Clarify operations related to merging locale data #804

Conversation

gibson042 commented Jul 11, 2023

ben-allen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sffc commented Aug 23, 2023

sffc commented Aug 23, 2023

gibson042 commented Aug 24, 2023

sffc commented May 2, 2024

ryzokuken left a comment

Choose a reason for hiding this comment

ryzokuken commented May 23, 2024