Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows input improvements #651

Merged
merged 4 commits into from
Apr 23, 2022
Merged

Windows input improvements #651

merged 4 commits into from
Apr 23, 2022

Conversation

matkaas
Copy link
Contributor

@matkaas matkaas commented Apr 10, 2022

Hi

I was playing around with tui and crossterm and ran into issue #561, so I thought I'd dig in and fix it. While I was there, I then also decided to fix some other issues around missing or inaccurate events for certain key combinations (#536, #643).

The unicode issue turns out to be fairly straightforward; we need to parse surrogate pairs (for Windows Terminal) and handle alt codes (for Conhost).

The missing/inaccurate keys issue turns out to be much tougher. See commit f52347c for details. I don't think it's possible to solve correctly in general -- the available data and APIs are simply insufficient. The solution I propose makes a best-effort attempt at providing an accurate event for all key combinations.
This works as one would want in Windows Terminal, but is unreliable in Conhost if users change their keyboard layout. I could not find any reliable way to detect the active keyboard layout under a Conhost terminal. I decided to settle for a partial solution (works when not changing keyboard layouts) rather than hacking together an unreliable solution.
Regarding a hacky, unreliable solution: It is actually possible to find a window handle for a Conhost terminal that can be queried for the active keyboard layout, but no API directly offers such a handle. I found this out by looking through process/window/thread handles in Spy++ for a Conhost terminal session. Enumerating and digging through attached console processes and their window handles seemed very brittle to me, so I didn't pursue it further. We could dig more in that direction if you think it's worthwhile to have that for Conhost terminals, even if it could break for any future change to the OS/Conhost infrastructure.

matkaas added 3 commits April 10, 2022 13:14
This enables us to deliver more key combinations triggered with tab, namely:
 * Ctrl+Tab
 * Ctrl+Shift+Tab
This entails proper decoding of UTF-16 surrogate pairs.

With this change, we deliver KeyEvent::Char(...) events for code points
outside of the BMP, such as many CJK code points as well as all emojis.

This works with both pasting and IME input in Windows Terminal.

This currently only works with IME input in Conhost terminal. Pasting doesn't
work because Conhost synthesizes Alt codes for the higher unicode scalar
values, rather than delivering a pair of surrogate code points. Some special
handling will be required to interpret unicode scalar values from Alt codes.
In addition to handling manual user input of Alt codes, this also handles
pasting of unicode from the supplemental planes into a Conhost terminal, as
the Conhost terminal encodes such input by synthesizing key sequences for an
Alt code.
@matkaas matkaas requested a review from TimonPost as a code owner April 10, 2022 13:01
@matkaas
Copy link
Contributor Author

matkaas commented Apr 10, 2022

Whoops, commit 90d11bb breaks the alt code parsing introduced by 0fcce0d. I'll get that fixed.

Many key combinations produce key events which have u_char == 0, and these
have been discarded until now. This for example includes all combinations
involving the Ctrl+Alt modifiers, as well as many key combinations with just
Ctrl. We can provide events for such key combinations by determining the
character associated with the keys from consulting the keyboard layout. Almost
all keys on a keyboard have characters associated with them -- it's just a
question of whether we can determine what character corresponds to a key event.
There are some caveats involved in doing that...

In addition, the key events with u_char in the ASCII control code range was
until now mapped into the ASCII range '@' to '_' which is inaccurate for many
keys and for users with non-US keyboard layouts. The character for key
combinations that produce control codes are now also handled by consulting the
keyboard layout.

The caveats revolve around determining the keyboard layout, which has two
issues:
 1. There is a race condition between the user typing in their terminal with one
    keyboard layout active, and the console application determining the keyboard
    layout while processing the key event later, as these two events happen
    asynchronously. If a user changes the active keyboard layout in between the
    two events, then the console application might misinterpret the character.
 2. For console applications running in a Conhost terminal, it turns out to be
    very difficult to determine the active keyboard layout. There appears to be
    no available APIs that reliably provide the layout.
@matkaas
Copy link
Contributor Author

matkaas commented Apr 10, 2022

Whoops, commit 90d11bb breaks the alt code parsing introduced by 0fcce0d. I'll get that fixed.

Made a fixup for 90d11bb; it's fixed in the new commit f52347c.

@TimonPost
Copy link
Member

TimonPost commented Apr 17, 2022

What exactly is a surrogate and what does it try to fix?

@matkaas
Copy link
Contributor Author

matkaas commented Apr 17, 2022

What exactly is a surrogate and what does it try to fix?

In the context of Unicode, surrogates are code points in the range 0xD800 - 0xDFFF which are used in UTF-16 to encode other code points that don't otherwise fit in a single 2-byte code unit (i.e. every code point larger than 65535).
The Wikipedia page for UTF-16 explains in more details: https://en.wikipedia.org/wiki/UTF-16

They're relevant because we retrieve UTF-16 encoded input from the Win32 Console API when using ReadConsoleInputW in crossterm_winapi::console::Console::read_input. When the terminal wants to send us a character from outside the Basic Multilingual Plane (BMP), it encodes the character as a surrogate pair and passes it to us in two consecutive INPUT_RECORDs, each containing one UTF-16 code unit (the u_char member). We have to decode the two code units together to retrieve the character (unicode scalar value).

Copy link
Member

@TimonPost TimonPost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Gave it a try and a test today. Looks like a good solution! Thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants