You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sanitize currently works by simply deleting any characters that do not match the RFC3986 specification for URL components. While this does not result in any unexpected behaviour within the ASCII-sphere, it does not work for parts of the world which often use punycode domains, which are often used in languages whose scripts are not representable in ASCII.
In certain browsers, this will actually display with the unicode in the navigation bar, so it is understandable for users to expect unicode to work.
This also introduces security implications, as simply deleting characters considered invalid (which, as explained above, may well be expected to be valid by certain users) could result in an unexpected domain name and redirect users to somewhere they did not wish to go.
I'd therefore suggest either making sanitize simply fail on invalid characters (therefore ensuring that unexpected alteration does not occur), or having it make the conversion to punycode (note that the codecs module of the standard library already supports punycode as a text encoding).
The text was updated successfully, but these errors were encountered:
linkshortener/linkshortener/shortener.py
Lines 32 to 34 in 1f633f4
Sanitize currently works by simply deleting any characters that do not match the RFC3986 specification for URL components. While this does not result in any unexpected behaviour within the ASCII-sphere, it does not work for parts of the world which often use punycode domains, which are often used in languages whose scripts are not representable in ASCII.
Take the following domain as an example:
https://💩.la
This link when clicked on actually navigates one to
https://xn--ls8h.la/
In certain browsers, this will actually display with the unicode in the navigation bar, so it is understandable for users to expect unicode to work.
This also introduces security implications, as simply deleting characters considered invalid (which, as explained above, may well be expected to be valid by certain users) could result in an unexpected domain name and redirect users to somewhere they did not wish to go.
I'd therefore suggest either making sanitize simply fail on invalid characters (therefore ensuring that unexpected alteration does not occur), or having it make the conversion to punycode (note that the codecs module of the standard library already supports punycode as a text encoding).
The text was updated successfully, but these errors were encountered: