Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer US English for system_category() messages, fall back to system locale, then ID 0 #5104

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

StephanTLavavej
Copy link
Member

@StephanTLavavej StephanTLavavej commented Nov 20, 2024

Fixes #3254, a regression introduced by #2669 in VS 2022 17.3.

😻 Thanks

Thanks to:

📜 Overview

The original (pre-#2669) strategy of using language ID 0 to request FormatMessageA's multi-step behavior was problematic because it tries the user's locale before the system locale, but the user's locale might not result in an intelligible char*, so we could get strings filled with question marks.

The status quo (post-#2669) strategy of using the system locale was problematic because the system locale might not have error message strings. For example, this is the case when the system locale is English (United Kingdom).

We also have to worry about single-language installations of Windows, which don't have US English error message strings.

After considering all of these scenarios and talking to contributors, I believe that the best strategy is to attempt 3 things in this order:

  1. Prefer US English.
    • This should succeed on multi-language installations of Windows, which is the common case.
    • This is consistent with generic_category(), as my new comment explains.
    • Instead of dynamically querying for what "en-US" is, or using the (hilariously mega-deprecated in winnt.h) macro MAKELANGID to construct it, I'm just hardcoding the language ID.
  2. Fall back to the system locale, if we can obtain its language ID.
  3. As an ultimate fallback, if we can't get the system locale or it doesn't have an error message string, then use language ID 0.
    • This is what main does if we can't get the system locale, and what we always did before <system_error>: explicitly pass the system locale ID for system_category messages #2669, so we have many years of experience with this codepath.
    • If this doesn't result in something intelligible (e.g. US English isn't available, the system locale doesn't have an error message string for us, and language ID 0 behavior picks up the user's Windows display language, but its encoding is radically different from what the programmer-user expects for a char*), well, we tried. At this time, I don't want to explore novel logic like calling FormatMessageW and converting to UTF-8. If this scenario turns out to occur in practice, we can explore changes in the future.

💻 Test Code

C:\temp>type meow.cpp
#include <limits>
#include <print>
#include <string>
#include <system_error>
#include <Windows.h>
using namespace std;

void print_system_locale_name() {
    wchar_t buf[LOCALE_NAME_MAX_LENGTH]{};
    if (GetLocaleInfoEx(
        LOCALE_NAME_SYSTEM_DEFAULT,
        LOCALE_SNAME,
        buf,
        LOCALE_NAME_MAX_LENGTH
    ) == 0) {
        println("GetLocaleInfoEx failed!");
    }
    string narrow;
    for (const auto& w : buf) {
        if (w == L'\0') {
            break;
        } else if (w > (numeric_limits<char>::max)()) {
            narrow.push_back('#');
        } else {
            narrow.push_back(static_cast<char>(w));
        }
    }
    println(R"(  System locale name: "{}")", narrow);
}

void print_system_language_id() {
    DWORD lang_id = 0;
    if (GetLocaleInfoEx(
        LOCALE_NAME_SYSTEM_DEFAULT,
        LOCALE_ILANGUAGE | LOCALE_RETURN_NUMBER,
        reinterpret_cast<LPWSTR>(&lang_id),
        sizeof(lang_id) / sizeof(wchar_t)
    ) == 0) {
        println("GetLocaleInfoEx failed!");
    }
    println("  System language ID: {0} decimal, 0x{0:04x} hex", lang_id);
}

int main() {
    print_system_locale_name();
    print_system_language_id();

    // Repro derived from @davemcincork's example.
    println(R"(ERROR_FILE_NOT_FOUND: "{}")", system_category().message(2));
}

🧑‍🔬 Initial Testing

On a Dev Box (Win11 24H2, en-US), I left the "Windows display language" unchanged:

Settings > Time & language > Language & region > Windows display language > English (United States)

But I changed the system locale to en-GB and rebooted:

Settings > Time & language > Language & region > Administrative language settings > Change system locale... > English (United Kingdom)

Before this PR:

C:\temp>cl /EHsc /nologo /W4 /std:c++latest /MTd /Od meow.cpp && meow
meow.cpp
  System locale name: "en-GB"
  System language ID: 2057 decimal, 0x0809 hex
ERROR_FILE_NOT_FOUND: "unknown error"

After this PR:

C:\temp>cl /EHsc /nologo /W4 /std:c++latest /MTd /Od meow.cpp && meow
meow.cpp
  System locale name: "en-GB"
  System language ID: 2057 decimal, 0x0809 hex
ERROR_FILE_NOT_FOUND: "The system cannot find the file specified."

Retested with the 3-step loop successfully.

🧪 More Testing

Then, I verified that this PR isn't causing #2451 to reappear (this was the original bug that #2669 attempted to fix).

Following #2451 (comment), I set my Dev Box's Windows display language to Chinese (installing all of the language pack stuff) and the system locale back to en-US. Then, after slightly altering the repro to use format() instead of print(), I verified that the side-by-side install of VS 2019 16.11 reproed the original bug, VS 2022 17.13 Preview 2 (internal) worked with #2669's changes (the status quo of main), and this PR worked as well.

Retested with the 3-step loop successfully.

1️⃣ Single-Language OS Testing

In #5104 (review), @muellerj2 verified that

  • de-DE single-language system with de-DE system locale
  • de-DE single-language system with de-CH system locale

succeed with German strings. These are exercising the system locale and ID 0 fallbacks, respectively.

@TheStormN

This comment was marked as resolved.

@muellerj2

This comment was marked as resolved.

@StephanTLavavej

This comment was marked as resolved.

@TheStormN

This comment was marked as resolved.

@StephanTLavavej

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej changed the title Always use US English for system_category() messages Prefer US English for system_category() messages, fall back to system locale Nov 21, 2024
@jovibor

This comment was marked as resolved.

@StephanTLavavej

This comment was marked as resolved.

muellerj2

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej changed the title Prefer US English for system_category() messages, fall back to system locale Prefer US English for system_category() messages, fall back to system locale, then ID 0 Nov 21, 2024
Copy link
Contributor

@muellerj2 muellerj2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de-DE single-language system with de-DE system locale:

  System locale name: "de-DE"
  System language ID: 1031 decimal, 0x0407 hex
ERROR_FILE_NOT_FOUND: "Das System kann die angegebene Datei nicht finden."

de-DE single-language system with de-CH system locale:

  System locale name: "de-CH"
  System language ID: 2055 decimal, 0x0807 hex
ERROR_FILE_NOT_FOUND: "Das System kann die angegebene Datei nicht finden."

@sylveon

This comment was marked as resolved.

@StephanTLavavej
Copy link
Member Author

As I mentioned in the PR description, we can consider pathological scenarios in the future, if they turn out to happen with any significant frequency. I think I would actually recommend manually LocalAllocing a buffer and stringizing the error code value to produce "system error code 1729" as a fallback.

@CaseyCarter CaseyCarter removed their assignment Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Ready To Merge
Development

Successfully merging this pull request may close these issues.

Visual Studio 2022 std::system_category returns "unknown error" if system locale is not en-US
6 participants