-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask control characters in filenames #118
Conversation
Single-byte control character masking isn't enough. At least Konsole and Xfce Terminal (but not uxterm) interpret C1 control codes and CSI sequences in
A proper masking method must decode multibyte characters. It must tolerate invalid multibyte sequences and restart decoding from the next byte. |
src/common/tuklib_cntrl_chars.c
Outdated
extern const char * | ||
tuklib_mask_cntrl(const char *str) | ||
{ | ||
static char *mem = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Threading is utilized, so this src/common function should probably have static thread_local char *mem to avoid future issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thread_local
needs C11 but the codebase is C99 for now. The function is documented to be thread unsafe. It's called from the main thread only.
The function is really primitive as it doesn't even provide more than one memory slot. That is, one cannot print a message that needs two masked strings. It's easy to expand but the first version doesn't need to do more than is required right now.
src/common/tuklib_cntrl_chars.h
Outdated
/// \file tuklib_cntrl_chars.h | ||
/// \brief Find and replace single-byte control characters | ||
/// | ||
/// This is a small and very simple implementation that uses isnctrl(3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnctrl
=> iscntrl
The new version should handle all relevant multibyte character sets, not just UTF-8. Instead of looking for control characters, it now looks for non-printable characters which is a much stricter check. A possible downside is that an old C library might not recognize newer printable Unicode characters even though the user might be using them already. I suppose it's not a real problem. :-) Gnulib's
I hope I didn't miss any string that should be masked. |
In multibyte locales, some control characters are multibyte too, for example, terminals interpret C1 control characters (U+0080 to U+009F) that are two bytes as UTF-8. Thus, multibyte character sets have to be handled. Instead of checking for control characters with iswcntrl(), this uses iswprint() to detect printable characters. This is much stricter. Gnulib's quotearg would do a lot more but I hope such a thing isn't needed here. Thanks to Ryan Colyer for the discussion about the problems of the earlier single-byte-only method. Thanks to Christian Weisgerber for reporting a bug in an earlier version of this code. Thanks to Jeroen Roovers for a typo fix.
Call tuklib_mask_nonprint() on filenames and also on a few other strings from the command line too. The lack of this feature has been listed in TODO since 2009: 5f6dddc
This prepares for tuklib_mask_nonprint() from tuklib_mbstr_nonprint.c. It has locale-specific behavior (LC_CTYPE).
I merged this with a few changes:
|
The command line tools print filenames and some other user-specified strings to standard output or standard error. Malicious strings, for example, from filenames could contain control characters that affect the state of the terminal.
These commits add a function to replace the single-byte control characters with question marks. This is simple but hopefully good enough in practice.