Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: use wide string for findPackageJson onWindows #55861

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

islandryu
Copy link
Contributor

Fix error when searching for package.json with non-ASCII characters in paths

Fixes: #55773

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Nov 15, 2024
@aduh95
Copy link
Contributor

aduh95 commented Nov 15, 2024

Can you add a test?

Fix error when searching for package.json with non-ASCII characters in
paths

Fixes: nodejs#55773
Comment on lines 11 to 15
fs.mkdirSync(tmpdir.resolve('12月'), { recursive: true });
fs.writeFileSync(
tmpdir.resolve('12月/index.js'),
"console.log('12月');",
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a fixture, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I should do that.

test/parallel/test-non-ascii.js Outdated Show resolved Hide resolved
test/parallel/test-non-ascii.js Outdated Show resolved Hide resolved
test/parallel/test-non-ascii.js Outdated Show resolved Hide resolved
test/parallel/test-non-ascii.js Outdated Show resolved Hide resolved
'use strict';

const fs = require('node:fs');
const common = require('../common');
Copy link
Member

@RedYetiDev RedYetiDev Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common must be the first import. FWIW running make lint will catch this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the lint error.
I've fixed it.

@islandryu islandryu marked this pull request as ready for review November 15, 2024 14:54
Copy link

codecov bot commented Nov 15, 2024

Codecov Report

Attention: Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 87.93%. Comparing base (b02cd41) to head (c295818).
Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
src/node_modules.cc 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #55861      +/-   ##
==========================================
- Coverage   88.42%   87.93%   -0.49%     
==========================================
  Files         654      654              
  Lines      187852   188038     +186     
  Branches    36134    35871     -263     
==========================================
- Hits       166102   165350     -752     
- Misses      14989    15868     +879     
- Partials     6761     6820      +59     
Files with missing lines Coverage Δ
src/node_file.cc 77.39% <100.00%> (+0.04%) ⬆️
src/util.cc 87.39% <ø> (ø)
src/util.h 91.52% <100.00%> (+0.29%) ⬆️
src/node_modules.cc 78.64% <83.33%> (-0.14%) ⬇️

... and 95 files with indirect coverage changes

src/util.h Outdated
std::wstring ConvertToWideString(const std::string& str);

#define BufferValueToPath(str) \
std::filesystem::path(ConvertToWideString(str.ToString()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These macros are unnecessary. Also, str.ToString() makes an unnecessary copy, even tho, the function is const std::string&. str.ToStringView() would remove the unnecessary copy here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code to avoid using macros and to use std::string_view instead.

@@ -0,0 +1 @@
console.log('check non-ascii');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During testing, it became evident that the occurrence of the error described in issue #55773 is influenced by the character encoding used by the user.

For instance, users working with CP1252 are unlikely to encounter any errors, regardless of the characters they use. In contrast, users with encodings like CP932 are more prone to experiencing errors.

Take the character "月" as an example, which is represented in UTF-8 as:
E6 9C 88

In CP932, the byte 88 is interpreted as the leading byte of a 2-byte character, potentially causing issues.

The key takeaway is that to ensure accurate regression testing, it might be necessary to test the runtime environment with character encodings other than CP1252.

I also looked for other tests that might include regression testing against Windows-specific character encodings but couldn’t find any. If you have any good ideas, I’d appreciate your input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, you’re looking for something along these lines:

const stdoutExec = execSync('@chcp 932 >nul & \"' + process.execPath + '\" \"' + nonAsciiPath + '\"', 
                              { encoding: 'utf8'});
assert.strictEqual(stdoutExec, 'check non-ascii\n');

However, I wanted to point out that this code does not fail when run on the main branch on my local machine. Because of this, it might not be ideal to use this approach directly without further investigation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your response.
I tried it as well, and it seems that the behavior was not affected.
It appears that we need to change the system's character encoding rather than just the console's encoding.

For instance, the following minimal C++ code reproduces the same error:

int main() {
  std::string path = "\x88";
  std::filesystem::path file(path);
}

This code alone triggers the same issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently investigating whether it is possible to change the system's character encoding within a cctest.

@islandryu
Copy link
Contributor Author

The key takeaway is that to ensure accurate regression testing, it might be necessary to test the runtime environment with character encodings other than CP1252.

Regarding the content mentioned above:
The root cause of the error is that std::filesystem::path internally converts file paths using the CP_ACP code page.
Based on my investigation, CP_ACP cannot be modified either through command-line arguments or programmatically.
Therefore, in order to execute an accurate regression test, the CI system environment itself would need to be adjusted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
6 participants