Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22724 CLDR release-46 to main #3260

Conversation

DraganBesevic
Copy link
Contributor

@DraganBesevic DraganBesevic commented Oct 31, 2024

Checklist

  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22724
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

Final round of integration, hopefully!

It covers the changes for likelySubtags test data made after beta3, for this ticket:
CLDR-18002 Fix likely subtag inconsistency (unicode-org/cldr#4105)

@markusicu
Copy link
Member

Thanks!
I fixed up the PR description.
Please generate & add the binary data files as usual.
Then please trigger the exhaustive tests.

@markusicu
Copy link
Member

By the way, I assume that this integrates CLDR tag release-46 -- since there was not beta4 tag: https://github.com/unicode-org/cldr/tags
We should rename the PR title so that this is not misleading.

@DraganBesevic
Copy link
Contributor Author

Thanks! I fixed up the PR description. Please generate & add the binary data files as usual. Then please trigger the exhaustive tests.

There were no changes for binary files for this integration. Exhaustive tests are already running, there are under "Action" tab

@DraganBesevic DraganBesevic changed the title ICU-22724 CLDR 46 beta4 to main ICU-22724 CLDR release-46 to main Oct 31, 2024
@markusicu
Copy link
Member

There were no changes for binary files for this integration.

At a minimum, the likelySubtags data feeds into icu4c/source/data/misc/langInfo.txt and its .res file.

@DraganBesevic
Copy link
Contributor Author

By the way, I assume that this integrates CLDR tag release-46 -- since there was not beta4 tag: https://github.com/unicode-org/cldr/tags We should rename the PR title so that this is not misleading.

Updated

@DraganBesevic
Copy link
Contributor Author

DraganBesevic commented Oct 31, 2024

There were no changes for binary files for this integration.

At a minimum, the likelySubtags data feeds into icu4c/source/data/misc/langInfo.txt and its .res file.

Well, not sure what happened. I ran all steps on the latest CLDR maint/maint-46 and ICU maint/maint-76

(HEAD -> CLDR-18079-changes-for-46beta4-icu-integration, upstream/maint/maint-46, maint/maint-46)

commit 927d24438de6a3af8e147e61ea22d624b367dd38 (HEAD -> ICU-22724-CLDR-46-beta4-to-main, origin/ICU-22724-CLDR-46-beta4-to-main)
Author: DraganBesevic <[email protected]>
Date:   Thu Oct 31 10:02:51 2024 -0700

    ICU-22724 Integrate CLDR 46 release beta4

commit 8eca245c7484ac6cc179e3e5f7c1ea7680810f39 (upstream/maint/maint-76, maint/maint-76)

Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOKTM.

Copy link
Contributor

@mihnita mihnita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
M

@markusicu
Copy link
Member

At a minimum, the likelySubtags data feeds into icu4c/source/data/misc/langInfo.txt and its .res file.

Well, not sure what happened. I ran all steps on the latest CLDR maint/maint-46 and ICU maint/maint-76

I finally figured out part of my confusion: You are only changing the likelySubtags test data here, not the runtime data. Hence no impact on langInfo.txt.

However, unicode-org/cldr#4105 did change both the runtime data and the test data. And so integrating that change should also change the ICU runtime data.

In other words, this change here seems ok, but it still only reflects half of what ICU needs from that CLDR PR.


FYI: CLDR has likelySubtags.xml in supplemental data, and likelySubtags.txt in test data. Different extensions, different paths, but still easy to mix up when looking at a PR

@sffc sffc mentioned this pull request Nov 4, 2024
7 tasks
@markusicu
Copy link
Member

@DraganBesevic any news? Any luck figuring out why the changes in common/supplemental/likelySubtags.xml from unicode-org/cldr#4105 are not showing up in ICU here?

The last time https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/source/data/misc/langInfo.txt was updated was in your #3174 integrating beta1. Well before Mark's oct02 change, but recent enough to think that the CLDR-to-ICU converter is not totally broken.

Maybe you and @mihnita and @pedberg-icu could stick your heads together on this?

Or, if the tests all pass, can we assume that the data change was a no-op for ICU's data structure?? Despite the test data changes?? Makes me nervous for future updates...

@richgillam help coordinate?

@DraganBesevic
Copy link
Contributor Author

DraganBesevic commented Nov 5, 2024 via email

@mihnita
Copy link
Contributor

mihnita commented Nov 13, 2024

TLDR: I think the tooling is fine, and we can (and should) merge this.


Any luck figuring out why the changes in common/supplemental/likelySubtags.xml from unicode-org/cldr#4105 are not showing up in ICU here?

From all I could determine, there is no tooling problem.
My best guess is that some manual step was accidentally skipped.

I tried various things, and the icu files gets updated.


I did a run the cldr conversion tool run with the cldr / icu release git tags
(icu at release-76-1 and cldr / cldr-staging at release-46 )

And the generated text files were the same to the one in icu at release-76-1.
EXCEPT for icu4c/source/data/brkitr/root.txt
Which was missing this part:

    lstm{
        Thai{"Thai_graphclust_model4_heavy.res"}
        Mymr{"Burmese_graphclust_model5_heavy.res"}
    }

That is expected, has to be copy-pasted by hand (documented in docs/processes/cldr-icu.md)

Then I changed likelySubtags.xml and re-generated.
These files changed:

icu4c/source/common/localefallback_data.h
icu4c/source/data/brkitr/root.txt
icu4c/source/data/misc/langInfo.txt
icu4j/main/core/src/main/java/com/ibm/icu/impl/LocaleFallbackData.java

So I think that the tooling is fine.

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thank you very much @mihnita !!

@DraganBesevic please merge.

@sffc after this PR is merged, what's the best way to get this onto the main branch as well, after PR #3258 is already in? Do the same thing once more which hopefully automagically picks up only this one new commit?

@DraganBesevic DraganBesevic merged commit 1a52a13 into unicode-org:maint/maint-76 Nov 13, 2024
98 checks passed
@DraganBesevic
Copy link
Contributor Author

DraganBesevic commented Nov 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants