Skip to content

Commit

Permalink
More updates from code review
Browse files Browse the repository at this point in the history
* Removed _limit_ parameter from StringSplitToList and adjusted
  its callsites.
* Added editorial note explaining Link resolution complexity.
* Minor updates to AvailableNamedTimeZoneIdentifiers.
  • Loading branch information
justingrant committed May 27, 2024
1 parent 16e86a8 commit b177cec
Showing 1 changed file with 36 additions and 22 deletions.
58 changes: 36 additions & 22 deletions spec/locales-currencies-tz.html
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,6 @@ <h1>AvailableNamedTimeZoneIdentifiers ( ): a List of Time Zone Identifier Record
1. Let _result_ be a new empty List.
1. For each element _identifier_ of _identifiers_, do
1. Let _primary_ be _identifier_.
1. NOTE: The algorithm steps below are intended to correspond to the behaviour of <code>icu::TimeZone::getIanaID()</code> in the International Components for Unicode (<a href="https://icu.unicode.org/">ICU</a>) and the processes for maintaining time zone identifier data in the Unicode Common Locale Data Repository (<a href="https://cldr.unicode.org">CLDR</a>).
1. If _identifier_ is a Link name in the IANA Time Zone Database and _identifier_ is not present in the “TZ” column of <code>zone.tab</code> of the IANA Time Zone Database, then
1. Let _zone_ be the Zone name that _identifier_ resolves to, according to the rules for resolving Link names in the IANA Time Zone Database.
1. If _zone_ starts with *"Etc/"*, then
Expand All @@ -277,15 +276,15 @@ <h1>AvailableNamedTimeZoneIdentifiers ( ): a List of Time Zone Identifier Record
1. Let _backzone_ be *undefined*.
1. Let _backzoneLinkLines_ be the List of lines in the file <code>backzone</code> of the IANA Time Zone Database that start with either *"Link "* or *"#PACKRATLIST zone.tab Link "*.
1. For each element _line_ of _backzoneLinkLines_, do
1. If _line_ starts with *"#PACKRATLIST zone.tab "*, set _line_ to the substring of _line_ from 22.
1. Assert: _line_ starts with *"Link "*.
1. Set _line_ to the substring of _line_ from 5.
1. Let _backzoneAndLink_ be StringSplitToList(_line_, *" "*, 2).
1. Assert: _backzoneAndLink_ has exactly two elements.
1. Let _i_ be StringIndexOf(_line_, *"Link "*, 0).
1. Set _line_ to the substring of _line_ from _i_ + 5.
1. Let _backzoneAndLink_ be StringSplitToList(_line_, *" "*).
1. Assert: _backzoneAndLink_ has at least two elements, and both _backzoneAndLink_[0] and _backzoneAndLink_[1] are available named time zone identifiers.
1. If _backzoneAndLink_[1] is _identifier_, then
1. Assert: _backzone_ is *undefined*.
1. Set _primary_ to _backzoneAndLink_[0].
1. Set _backzone_ to _backzoneAndLink_[0].
1. Assert: _backzone_ is not *undefined*.
1. Set _primary_ to _backzone_.
1. If _primary_ is one of *"Etc/UTC"*, *"Etc/GMT"*, or *"GMT"*, set _primary_ to *"UTC"*.
1. If _primary_ is a replacement time zone identifier and its rename waiting period has not concluded, then
1. Let _renamedIdentifier_ be the renamed time zone identifier that _primary_ replaced.
Expand All @@ -296,6 +295,27 @@ <h1>AvailableNamedTimeZoneIdentifiers ( ): a List of Time Zone Identifier Record
1. Return _result_.
</emu-alg>

<emu-note>
<p>
The algorithm above for resolving Links to primary time zone identifiers is intended to correspond to the behaviour of <code>icu::TimeZone::getIanaID()</code> in the International Components for Unicode (<a href="https://icu.unicode.org/">ICU</a>) and the processes for maintaining time zone identifier data in the Unicode Common Locale Data Repository (<a href="https://cldr.unicode.org">CLDR</a>).
</p>
<p>
This algorithm resolves Links to primary time zone identifiers without crossing the boundaries of ISO 3166-1 Alpha-2 country codes, using data from files <code>zone.tab</code> and <code>backzone</code> of the IANA Time Zone Database.
If the country code of a Link has only one line in <code>zone.tab</code>, then that line will determine the primary time zone identifier.
However, if that country code has multiple lines in <code>zone.tab</code>, then historical mappings in <code>backzone</code> must be used to identify the correct primary time zone identifier.
</p>
<p>
For example, if "Pacific/Truk" (in country code "FM") is a Link to "Pacific/Port_Moresby" (in country code "PG") with the default build options of the IANA Time Zone Database, then the “country-code” column of <code>zone.tab</code> will be checked for lines corresponding with "FM".
If there were only one such line, then the “TZ” column of that line would determine the primary time zone identifier associated with "Pacific/Truk".
But "FM" has multiple lines in <code>zone.tab</code>, so <code>backzone</code> must be consulted to find the line "Link Pacific/Chuuk Pacific/Truk", resulting in "Pacific/Chuuk" as the primary time zone identifier.
</p>
<p>
Note that <code>zone.tab</code> is the preferred source of mapping data because <code>backzone</code> mappings may, in rare cases, cross the boundaries of ISO 3166-1 Alpha-2 country codes.
For example, "Atlantic/Jan_Mayen" (in country code "SJ") is mapped in <code>backzone</code> to "Europe/Oslo" (in country code "NO").
As of the 2024a release of the IANA Time Zone Database, "Atlantic/Jan_Mayen" is the only case where this happens.
</p>
</emu-note>

<emu-note>
Time zone identifiers in the IANA Time Zone Database can change over time.
At a minimum, it is recommended that implementations limit changes to the result of AvailableNamedTimeZoneIdentifiers to the changes allowed by GetAvailableNamedTimeZoneIdentifier, for the lifetime of the surrounding agent.
Expand Down Expand Up @@ -352,41 +372,33 @@ <h1>
<h1>
StringSplitToList (
_S_: a String,
_separator_: a String that is not the empty String,
_limit_: a mathematical value in the range 1 to 2<sup>32</sup> - 1
_separator_: a String,
): a List of Strings
</h1>
<dl class="header">
<dt>description</dt>
<dd>
The returned List contains substrings of _S_.
These substrings are determined by searching from left to right for occurrences of _separator_; these occurrences are not part of any String in the returned List, but serve to divide _S_ into substrings.
The output List will contain no more than _limit_ elements; any additional separators and/or substrings present in _S_ will be ignored.
The returned List contains all disjoint substrings of _S_ that do not contain _separator_ but are immediately preceded and/or immediately followed by an occurrence of _separator_.
Each such <emu-not-ref>substring</emu-not-ref> will be the empty String between adjacent occurrences of _separator_, before a _separator_ at the very start of _S_, or after a _separator_ at the very end of _S_, but otherwise will not be empty.
</dd>
</dl>
<emu-alg>
1. If _S_ is the empty String, return « ».
1. Assert: _S_ is not the empty String.
1. Assert: _separator_ is not the empty String.
1. Let _separatorLength_ be the length of _separator_.
1. Assert: _separatorLength_ is not 0.
1. Let _substrings_ be a new empty List.
1. Let _i_ be 0.
1. Let _j_ be StringIndexOf(_S_, _separator_, 0).
1. Repeat, while _j_ ≠ -1,
1. Repeat, while _j_ is not ~not-found~,
1. Let _T_ be the substring of _S_ from _i_ to _j_.
1. Append _T_ to _substrings_.
1. If the number of elements in _substrings_ is _limit_, return _substrings_.
1. Set _i_ to _j_ + _separatorLength_.
1. Set _j_ to StringIndexOf(_S_, _separator_, _i_).
1. Let _T_ be the substring of _S_ from _i_.
1. Append _T_ to _substrings_.
1. Return _substrings_.
</emu-alg>
</emu-clause>
<emu-note>
<p>
Substrings in the returned List may be the empty String if _S_ starts or ends with _separator_ or if _S_ contains adjacent occurrences of _separator_.
</p>
</emu-note>
</emu-clause>

<emu-clause id="sec-measurement-unit-identifiers">
Expand Down Expand Up @@ -414,7 +426,9 @@ <h1>
<emu-alg>
1. If IsSanctionedSingleUnitIdentifier(_unitIdentifier_) is *true*, then
1. Return *true*.
1. Let _units_ be StringSplitToList(_unitIdentifier_, *"-per-"*, 3).
1. If _unitIdentifier_ is the empty String, then
1. Return *false*.
1. Let _units_ be StringSplitToList(_unitIdentifier_, *"-per-"*).
1. If the number of elements in _units_ is not 2, then
1. Return *false*.
1. If IsSanctionedSingleUnitIdentifier(_units_[0]) and IsSanctionedSingleUnitIdentifier(_units_[1]) are both *true*, then
Expand Down

0 comments on commit b177cec

Please sign in to comment.