Improve `rstudioapi::setSelectionRanges()` for multi-byte characters #2605

juliasilge · 2024-04-02T16:14:55Z

Related to #2582

Originally posted by @jennybc in #2582 (comment):

I think we probably have some more work to do here, around text that includes Unicode characters in the ✨astral plane"✨. This includes emoji, but also other stuff.

Here's my manual test:

Make a selection in the source editor.
In the Console, via typing / pasting/ or using the up-arrow, execute: x <- rstudioapi::getSourceEditorContext() to capture the current context, including selection data.
Now use rstudioapi::setSelectionRanges(x$selection[[1]][["range"]]) to complete the round trip, i.e. re-select the same selection.

We don't pass this test right now if, for example, the selection contains a emoji. RStudio does pass this test (although some of the numbers you see for positions are different). I think we want this sort of self-consistency, i.e. that we handle positions symmetrically for reading and setting.

My hunch is that PR needs to gain this sort of maneuver, but going the other way:

positron/src/vs/workbench/api/common/positron/extHostMethods.ts

Lines 137 to 168 in 7a3d14b

    
           // The selections in this text editor. The primary selection is always at index 0. 
        
           // 
        
           // The gymnastics here are so that we return character positions with respect to 
        
           // Unicode code points. Otherwise, the native Position type provides offsets with respect to 
        
           // UTF-16 encoded text. That would be confusing for downstream consumers, who probably 
        
           // ultimately receive this text as UTF-8 and want to operate on this text in terms of 
        
           // as user-perceivable "characters". This only matters when the selection's neighborhood 
        
           // includes Unicode characters in the astral plane. 
        
           // 
        
           // Another resource that supports that what I'm doing here is desirable in Jupyter-land: 
        
           // https://jupyter-client.readthedocs.io/en/latest/messaging.html#notes 
        
           const selections = editor.selections.map(selection => { 
        
           	const lineTextBeforeActive = editor.document 
        
           		.lineAt(selection.active.line) 
        
           		.text.substring(0, selection.active.character); 
        
           	const unicodePointsBeforeActive = Array.from(lineTextBeforeActive).length; 
        
           	const lineTextBeforeStart = editor.document 
        
           		.lineAt(selection.start.line) 
        
           		.text.substring(0, selection.start.character); 
        
           	const unicodePointsBeforeStart = Array.from(lineTextBeforeStart).length; 
        
           	const text = editor.document.getText(selection); 
        
           	const unicodePointsInSelection = Array.from(text).length; 
        
           	return { 
        
           		active: { line: selection.active.line, character: unicodePointsBeforeActive }, 
        
           		start: { line: selection.start.line, character: unicodePointsBeforeStart }, 
        
           		end: { line: selection.end.line, character: unicodePointsBeforeStart + unicodePointsInSelection }, 
        
           		text: text 
        
           	}; 
        
           });

Here's the basis of my manual test:

"🌷 b"

x <- rstudioapi::getSourceEditorContext()
f(x$selection[[1]])

rg <- x$selection[[1]][["range"]]
# creating the same document_range explicitly
# rg <- rstudioapi::document_range(
#   rstudioapi::document_position(1, 2),
#   rstudioapi::document_position(1, 5)
# )
rstudioapi::setSelectionRanges(rg)

# manual test cases
# select the text inside the quotes
# "a b"    round trip works
# "ä b"    round trip works
# "🌷 b"   round trip does NOT work, 'b' not part of selection
# "🤷‍♀️ b"   round trip does NOT work, 'b' not part of selection, but
#          for reasons documented elsewhere, we believe this is currently
#          out of scope

f <- function(sel) {
  cli::cli_inform("text: {.q {sel$text}}")
  cli::cli_inform("nchar(text): {.q {nchar(sel$text)}}")
  cli::cli_inform('
    range: [{sel$range$start["row"]}, {sel$range$start["column"]}] --
    [{sel$range$end["row"]}, {sel$range$end["column"]}]')
}

It's a little fiddly to get setup but here's what I do:

Define f() (at the end). (This is just for a pretty view of the current selection.)
Copy x <- rstudioapi::getSourceEditorContext() or just execute it once to get it in the history. (I'm using rstudioapi::getSourceEditorContext() instead of rstudioapi::getActiveDocumentContext() to make it easier to run the same code in RStudio, where those functions are actually different.)
Then follow the steps above, replacing the text in line 1 with the test cases above. I manually select the text inside the double quotes.

The text was updated successfully, but these errors were encountered:

juliasilge added the lang: r label Apr 2, 2024

jennybc mentioned this issue Apr 3, 2024

Feature Request: Shims for {rstudioapi} #1312

Closed

petetronic added this to the Future milestone Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `rstudioapi::setSelectionRanges()` for multi-byte characters #2605

Improve `rstudioapi::setSelectionRanges()` for multi-byte characters #2605

juliasilge commented Apr 2, 2024 •

edited by jennybc

Loading

Improve rstudioapi::setSelectionRanges() for multi-byte characters #2605

Improve rstudioapi::setSelectionRanges() for multi-byte characters #2605

Comments

juliasilge commented Apr 2, 2024 • edited by jennybc Loading

Improve `rstudioapi::setSelectionRanges()` for multi-byte characters #2605

Improve `rstudioapi::setSelectionRanges()` for multi-byte characters #2605

juliasilge commented Apr 2, 2024 •

edited by jennybc

Loading