-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rust]C++ Windows向けのexampleを修正します #208
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,11 @@ | ||
#pragma once | ||
#include <iostream> | ||
#include "core.h" | ||
|
||
std::string GetOpenJTalkDict(); | ||
std::wstring GetWaveFileName(); | ||
std::wstring GetExePath(); | ||
std::wstring GetExeDirectory(); | ||
std::string wide_to_multi_capi(std::wstring const& src); | ||
std::string wide_to_utf8_cppapi(std::wstring const& src); | ||
void OutErrorMessage(VoicevoxResultCode messageCode); | ||
std::string wide_to_utf8_cppapi(std::wstring const& src); | ||
std::wstring utf8_to_wide_cppapi(std::string const& src); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
とのことでしたが、ここの
GetOpenJTalkDict
の戻り値の文字コードで問題が起こっていそうですね……。GetOpenJTalkDict
の実装を見ると文字コードを Shift-JIS としているみたいですが、以前の C++ 版コアのvoicevox_load_openjtalk_dict
関数は引数のパス文字列の文字コードが Shift-JIS だったのでしょうか?また、現在の Rust 版の
voicevox_load_openjtalk_dict
の実装は以下なのですが、UTF-8 の文字列としてデコードできない入力ははじくようになっています(Rust の文字列の内部エンコーディングが環境によらず UTF-8 なので):voicevox_core/crates/voicevox_core/src/c_export.rs
Lines 245 to 255 in 1284e26
もし UTF-8 の文字列を渡した場合も試していた場合、それでも日本語を含むパスをうまく扱えなかった感じでしょうか?(もしそうであれば、引数の文字コードは UTF-8 のままとしたいですが、内部でOpenJTalk の API に渡す際は Windows 版だけ扱う文字コードを変更するように実装する必要がありそうですね……!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF-8でも試しましたが、文字化けします。
きれいに、UTF-8をShift-JISで見たときの化け方です。
C++版コアのときも、パスはShift-JISで渡していました。
rust側で文字コードの変換を行うということですね。
GetOpenJTalkDict
の戻り値をUTF-8に変更します。There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文字化けしてるのRustで実装されてるcore側の実装バグっぽそうなのでなおしたらutf-8で行けそうなきがします
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
そうですね! 個人的にはおそらく以下の open_jtalk-rs 内の
open_jtalk_sys::Mecab_load
を呼ぶ部分で Windows 版だけパス文字列を Shift-JIS に変換するのが良い気がしています(Rust コード内では一貫して UTF-8 で扱った方が都合が良いので)。https://github.com/VOICEVOX/open_jtalk-rs/blob/4cd7ff11943f2c0b976f9c8e9593f47339a60444/crates/open_jtalk/src/mecab/mod.rs#L40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OS別に分けるのではなくto_strをas_os_strにするだけでいいはずです
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
言われてみれば、確かに元々は文字コードを変換するようにしていたのをコメントアウトしたようですね……。
https://github.com/VOICEVOX/open_jtalk/blob/dc5c4fadcb0d281f5ca19a58c9df94292b1358fc/src/mecab/src/common.h#L147-L154
経緯は不明です……(コミットログを見ると、
CreateFileA
関数を使うようにした変更は 2012 年、WPATH
マクロをコメントアウトしたのは 2016 年のようでした。コミットは大雑把にまとめられており、有用な情報は得られませんでした。おそらく、元の MeCab のソースを OpenJTalk 開発時の都合で変更したものと思われます。ユーザーが Shift-JIS の文字列を入力した時に変換なしでそれを扱いたかった?)。そうですね……! OpenJTalk に渡す文字列を全て UTF-8 に統一できそうなのでスッキリしそうです。
WPATH
はファイルパスだけに関わるマクロだと思うので、これを変更したところで不具合は出ない、と思いたいですね……。There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
あ、ただ、open_jtalk のコードをいじると、それに依存している pyopenjtalk にも影響が出ますね!
https://github.com/VOICEVOX/pyopenjtalk/blob/master/.gitmodules
pyopenjtalk は、今回の
Mecab_load
関数に関わるところで言えば、たとえば以下のような箇所でシステムロケールの文字コードを要求しているようです(他の箇所にもある)。open_jtalk を直すなら、pyopenjtalk もそれに合わせて修正しなければなりませんね……。https://github.com/VOICEVOX/pyopenjtalk/blob/50b0296a9e1b666e5a09a41ec9e9284a2a9b608f/pyopenjtalk/__init__.py#L103
これを考えると、今はとりあえず open_jtalk-rs 内で対応するのが無難のような気がしてきました……
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
記憶を呼び覚ましていました。
結論から言うと、ボイボのpyopenjtalkのことは無視してopenjtalk/mecabの中のコードを変えちゃうのが良いのかなと思いました!
理由は単純に、本家のpyopenjtalk(とopenjtalk)は全部UTF8で扱う実装になっているのに対して、VOICEVOXはpyopenjtalkだけがsystem localの文字コードを使うようになっているためです。
本家に合わせるのがいろいろ楽そうなので、pyopenjtalk側を合わせちゃうのが良いのかな~と。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2転しちゃってすみません 🙇♂️
が、たぶんですが、system localの文字コードを扱い始めるのはなかなかにしんどさを連れてくる可能性があるので、ゴリッと全部UTF8にしちゃいたいな~という気持ちです・・・!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PRつくりました
VOICEVOX/open_jtalk#7