Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add :invalid and :replace options to String#encode #2131

Merged
merged 11 commits into from
Jun 22, 2024

Conversation

seven1m
Copy link
Member

@seven1m seven1m commented Jun 22, 2024

What a yak shave. 😅 Adding EncodingObject::next_codepoint() is a big win though.

#217

seven1m added 11 commits June 21, 2024 22:52
...and implement Utf8EncodingObject::next_char using it.

In addition to giving a number instead of a StringView, next_codepoint
is greedy, consuming as many bytes as makes sense for the encoding, even
if an incomplete character would be produced.

This is hard to demonstrate in pure Ruby, but this kinda shows it:

    # a valid four-byte character
    "\xF0\x9F\x98\x81".chars => ["😁"]

    # last byte is invalid, four one-byte chars returned
    "\xF0\x9F\x98_".chars => ["\xF0", "\x9F", "\x98", "_"]

    # changing encoding needs greedy consumption of
    # the first three bytes
    "\xF0\x9F\x98_".encode('utf-16le', invalid: :replace).chars
      => ["\uFFFD", "_"]
@seven1m seven1m merged commit 4f41caa into master Jun 22, 2024
15 checks passed
@seven1m seven1m deleted the string-encode-invalid-option branch June 22, 2024 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant