Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve rstudioapi::setSelectionRanges() for multi-byte characters #2605

Open
juliasilge opened this issue Apr 2, 2024 · 0 comments
Open
Labels
Milestone

Comments

@juliasilge
Copy link
Contributor

juliasilge commented Apr 2, 2024

Related to #2582

Originally posted by @jennybc in #2582 (comment):

I think we probably have some more work to do here, around text that includes Unicode characters in the ✨astral plane"✨. This includes emoji, but also other stuff.

Here's my manual test:

  1. Make a selection in the source editor.
  2. In the Console, via typing / pasting/ or using the up-arrow, execute: x <- rstudioapi::getSourceEditorContext() to capture the current context, including selection data.
  3. Now use rstudioapi::setSelectionRanges(x$selection[[1]][["range"]]) to complete the round trip, i.e. re-select the same selection.

We don't pass this test right now if, for example, the selection contains a emoji. RStudio does pass this test (although some of the numbers you see for positions are different). I think we want this sort of self-consistency, i.e. that we handle positions symmetrically for reading and setting.

My hunch is that PR needs to gain this sort of maneuver, but going the other way:

// The selections in this text editor. The primary selection is always at index 0.
//
// The gymnastics here are so that we return character positions with respect to
// Unicode code points. Otherwise, the native Position type provides offsets with respect to
// UTF-16 encoded text. That would be confusing for downstream consumers, who probably
// ultimately receive this text as UTF-8 and want to operate on this text in terms of
// as user-perceivable "characters". This only matters when the selection's neighborhood
// includes Unicode characters in the astral plane.
//
// Another resource that supports that what I'm doing here is desirable in Jupyter-land:
// https://jupyter-client.readthedocs.io/en/latest/messaging.html#notes
const selections = editor.selections.map(selection => {
const lineTextBeforeActive = editor.document
.lineAt(selection.active.line)
.text.substring(0, selection.active.character);
const unicodePointsBeforeActive = Array.from(lineTextBeforeActive).length;
const lineTextBeforeStart = editor.document
.lineAt(selection.start.line)
.text.substring(0, selection.start.character);
const unicodePointsBeforeStart = Array.from(lineTextBeforeStart).length;
const text = editor.document.getText(selection);
const unicodePointsInSelection = Array.from(text).length;
return {
active: { line: selection.active.line, character: unicodePointsBeforeActive },
start: { line: selection.start.line, character: unicodePointsBeforeStart },
end: { line: selection.end.line, character: unicodePointsBeforeStart + unicodePointsInSelection },
text: text
};
});


Here's the basis of my manual test:

"🌷 b"

x <- rstudioapi::getSourceEditorContext()
f(x$selection[[1]])

rg <- x$selection[[1]][["range"]]
# creating the same document_range explicitly
# rg <- rstudioapi::document_range(
#   rstudioapi::document_position(1, 2),
#   rstudioapi::document_position(1, 5)
# )
rstudioapi::setSelectionRanges(rg)

# manual test cases
# select the text inside the quotes
# "a b"    round trip works
# "ä b"    round trip works
# "🌷 b"   round trip does NOT work, 'b' not part of selection
# "🤷‍♀️ b"   round trip does NOT work, 'b' not part of selection, but
#          for reasons documented elsewhere, we believe this is currently
#          out of scope

f <- function(sel) {
  cli::cli_inform("text: {.q {sel$text}}")
  cli::cli_inform("nchar(text): {.q {nchar(sel$text)}}")
  cli::cli_inform('
    range: [{sel$range$start["row"]}, {sel$range$start["column"]}] --
    [{sel$range$end["row"]}, {sel$range$end["column"]}]')
}

It's a little fiddly to get setup but here's what I do:

  1. Define f() (at the end). (This is just for a pretty view of the current selection.)
  2. Copy x <- rstudioapi::getSourceEditorContext() or just execute it once to get it in the history. (I'm using rstudioapi::getSourceEditorContext() instead of rstudioapi::getActiveDocumentContext() to make it easier to run the same code in RStudio, where those functions are actually different.)
  3. Then follow the steps above, replacing the text in line 1 with the test cases above. I manually select the text inside the double quotes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants