-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf8n_to_uvchr(): Simplify and fix some overlongs edge cases #22757
Commits on Nov 24, 2024
-
utf8.c: Replace macros by more compact equivalents
There are shortcuts available that cut these 8 names to 2.
Configuration menu - View commit details
-
Copy full SHA for 9be0c1f - Browse repository at this point
Copy the full SHA 9be0c1fView commit details -
utf8.c: Move most important conditional to be first
It turns out that the information generated in this block is only needed if the final conditional in this complicated group of them is true, which checks if the caller wants anything special for certain classes of code points. Because that final condition is subsidiary, the block was getting executed just to be thrown away.
Configuration menu - View commit details
-
Copy full SHA for a51ae5e - Browse repository at this point
Copy the full SHA a51ae5eView commit details -
As a first step in simplifying this overly complicated series of conditionals, pull out the first one into a separate 'if'. The next commits will do more.
Configuration menu - View commit details
-
Copy full SHA for 12285fd - Browse repository at this point
Copy the full SHA 12285fdView commit details -
utf8.c: Further simplify a complex conditional
This hoists a clause in a complex conditional to the 'if' statement above it, converting that to two conditionals from one, while decreasing the number in the much larger interior 'if' by 1. This is in preparation for further simplifications in the next few commits.
Configuration menu - View commit details
-
Copy full SHA for 3d86fdf - Browse repository at this point
Copy the full SHA 3d86fdfView commit details -
utf8.c: Further simplify complex conditional
This splits these into an if clause, and an else clause
Configuration menu - View commit details
-
Copy full SHA for 857fe56 - Browse repository at this point
Copy the full SHA 857fe56View commit details -
This makes things a bit simpler, but mainly leads to further simplifications in the next commits.
Configuration menu - View commit details
-
Copy full SHA for 8a4d5f9 - Browse repository at this point
Copy the full SHA 8a4d5f9View commit details -
utf8.c: Check specially for perl-extended UTF-8
More rigorous testing of the overlong malformation, yet to be committed, showed that this needs to be handled specially. This commit does part of that. Perl extended UTF-8 means you are using a start byte not recognized by any UTF-8 standard. Suppose it is an overlong sequence that reduces down to something representable using standard UTF-8. The string still used non-standard UTF-8 to get there, so should still be called out when the input parameters to this function ask for that. This commit is a first step towards that.
Configuration menu - View commit details
-
Copy full SHA for 8a3b341 - Browse repository at this point
Copy the full SHA 8a3b341View commit details -
utf8.c: Remove intermediate value
By not overriding the computed value of malformed input until later in the function, we can eliminate this temporary variable. This paves the way to a much bigger simplification in the next commit.
Configuration menu - View commit details
-
Copy full SHA for 88bb717 - Browse repository at this point
Copy the full SHA 88bb717View commit details -
It turns out that the work being done in the first block is only used in the second block. If that block doesn't get executed, the first block's effort is thrown away. So fold the first block into the second. This results in a bunch of temporaries that were used to communicate between the blocks being able to be removed. More detailed comments are added.
Configuration menu - View commit details
-
Copy full SHA for 2286cf0 - Browse repository at this point
Copy the full SHA 2286cf0View commit details -
Don't execute this loop if it would be pointless.
Configuration menu - View commit details
-
Copy full SHA for 953bbd9 - Browse repository at this point
Copy the full SHA 953bbd9View commit details -
utf8n_to_uvchr_msgs_helper: Add assertion
Make sure it isn't being called with unexpected input --
Configuration menu - View commit details
-
Copy full SHA for 7881d75 - Browse repository at this point
Copy the full SHA 7881d75View commit details -
utf8n_to_uvchr_msgs_helper: Don't throw away work
Admittedly not much work, but I realized in code reading that there are function exits that ignore this initialization. Instead move the initialization to later, where it is actually needed
Configuration menu - View commit details
-
Copy full SHA for 7f8a862 - Browse repository at this point
Copy the full SHA 7f8a862View commit details -
Configuration menu - View commit details
-
Copy full SHA for c9a6111 - Browse repository at this point
Copy the full SHA c9a6111View commit details -
utf8n_to_uvchr_msgs_helper(): Refactor expression
More rigorous testing of the overlong malformation, yet to be committed, showed that this didn't work as intended. The IS_UTF8_START_BYTE() excludes start bytes that always lead to overlong sequences. Fortunately the logic caused that to be mostly bypassed. But this commit fixes it all.
Configuration menu - View commit details
-
Copy full SHA for bfac0e3 - Browse repository at this point
Copy the full SHA bfac0e3View commit details