fix: break when the character is multibyte #8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#7
https://github.com/golang/text/blob/6c97a165dd661335ff7bce6104a008558123c353/encoding/encoding.go#L183
↑を参考に、修正案を書いてみました。
4096バイト以上の文字列が渡されたとき、transformerが4096バイト分ずつ文字列を取得してくるようです。(第二引数srcに渡ってくる値はmax 4096byte)
最後の数バイトが文字の途中か途中でないかを判断して処理breakし、次のTransform実行に引き継ぐようにしています。
また、utf8.Validによる判定も不要と判断しています。
長いutf8文字列を4096バイトずつ取得する場合、取得してきたバイト配列をutf-8として読み取れない場合があるからです。