Be careful with text from the `Text` interface. #161

TTWNO · 2024-12-24T22:17:33Z

Text returned from the org.a11y.atspi.Text interface is not just a String as per Rust rules.

It contains additional contracts like that of how the object replacement character should be interpreted and how to interpret it.

These contracts should be reflected in the API, even if that means using a newtype. Technically, most of the usefulness of this is for screen readers (AT-SPI consumers), and is not directly related to any work done for providers, but it will change the public API for using the interfaces from the atspi-proxies crate (both sending and receiving).

The text was updated successfully, but these errors were encountered:

TTWNO · 2024-12-24T22:20:05Z

A more clear example goes as follows:

I open firefox
I go to the address bar, then begin typing odilia.app, odilia doesn't read the autocomplete as it appears
I try pressing tab, the first link reads, the second does not
Search for the term "menu item" in the Odilia log, and you'll see there are some characters that will likely render as spaces in your terminal.
Pipe it through xxd to find the real byte sequence.

$ echo ' menu item' | xxd
00000000: efbf bcef bfbc efbf bcef bfbc efbf bcef  ................
00000010: bfbc efbf bcef bfbc 206d 656e 7520 6974  ........ menu it
00000020: 656d 0a                                  em.

TTWNO · 2024-12-24T22:34:48Z

You can use the start_index and end_index methods on the org.a11y.atspi.Hyperlink interface of individual children to find where they belong in the set of object replacement characters of the parent.

Yes, this applies even when they are not strictly a hyperlink; I didn't name the interface. If you put a table inside of a paragraph on a web page, this is the same mechanism that it will use to substitute within the text.

Test page (a paragraph starting with "Para", a table with "hi" and "bye" as its only cells, then "graph"; finally, the paragraph tag closes):

<p>Para <table style="display:inline;"><td>Hi</td><td>Bye</td></table> graph</p>

Results in this text being returned from AT-SPI:

$ echo 'Para  graph' | xxd
00000000: 5061 7261 20ef bfbc 2067 7261 7068 0a    Para ... graph.

This is not something that is clear through the API, and should at least be indicated by documentation, but ideally by a String-like newtype.

TTWNO · 2024-12-30T22:32:11Z

cc @luukvanderduim any thoughts on this?

luukvanderduim · 2025-01-02T12:42:07Z

Mainly questions, because I am still a bit confused.

Is it a valid method return?

org.a11y.atspi.Text::GetText returns 's'

D-Bus is specific on what is considered a 'string-like':
"The string-like types are basic types with a variable length. The value of any string-like type is conceptually 0 or more Unicode codepoints encoded in UTF-8, none of which may be U+0000. The UTF-8 text must be validated strictly: in particular, it must not contain overlong sequences or codepoints above U+10FFFF"

I don't think zvariant would deserialize non-UTF-8 as UTF-8. Unless, eg. unvalidated &[u8] is transmuted.

So I assume the unread codepoints are UTF-8 but not vocalizable?

Then there is a higher level questions. Is it html found in the address bar or is that just your example? Does the address bar object expose HyperLink as well as Text?
If so, should Odilia call GetUri?

Do Odilia users require conversion from non-vocalizable UTF-8 to something that is vocalizable?

TTWNO · 2025-01-02T18:41:42Z

So I assume the unread codepoints are UTF-8 but not vocalizable?

Yes. The codepoints are multiple instances of the object replacement character.

Is it html found in the address bar or is that just your example?

Not sure what you mean by this question, but both the example with the table and the address bar example (given by @albertotirla) contain "inline children"; the object replacement character is to denote where the children are in the text. This means they (or at least their children) implement the Hyperlink interface (regardless if it's HTML or not).

If so, should Odilia call GetUri?

This ignores the table example, where although the hyperlink interface is available, GetUrl will not return any useful information. It's a poor abstraction but the only one we have of finding out the location of children within the widgets.

Do Odilia users require conversion from non-vocalizable UTF-8 to something that is vocalizable?

Yes. You can not vocalize these codepoints directly. At minimum, we'd have to use a descriptive table of these symbols so they can be pronounced by Speech Dispatcher. There may be a function to do this in speech dispatcher, not sure yet.

albertotirla · 2025-01-03T03:06:12Z

and now we find out why screenreaders use virtual buffers, especially when it comes to web pages and even more so as regards when the apps are normal browser pages and require browse mode. Perhaps we should do the same, updating that buffer as we recieve more information?

as to the atspi crate, I don't think we should newtype String just for this, because the text interface still returns normal text most of the time, in edit boxes, in the terminal, etc. A better idea would be to create a type specifically for this purpose, which you construct by giving it ownership of the accessible proxy you're working with, and which then gives you an iterator (or stream) with the Item type something like this, note that the code does not compile:

enum InlineObjectKind <'a>{
    Text(&'a str),
    Hyperlink(AccessibleProxy<'a>,
    Other(AtspiRole, AccessibleProxy<'a>,
   }

and then we, on the odilia side, could build a virtual buffer abstraction which iterates through this, going to the next item every time one presses down arrow, or invokes the next item command when we'll have those abstractions open to input handlers. What do you think?

TTWNO · 2025-01-03T03:21:01Z

Can you expand on what you mean by virtual buffers?

albertotirla · 2025-01-03T03:47:57Z

basically, a virtual buffer is a screenreader specific construct in memory which can present slightly altered views of what the accessibility tree shows. For example, as far as I know orca doesn't have one, so that's why it can only show widgets inline, which makes a huge amount of things be able to fit in one line, but unfortunately that's confusing in almost all cases. Instead, we can use the screenreader's knowledge of how the user wants things shown to build an in-memory representation of it, for example splitting the string by object replacement characters and making an iterator as I said, and then we can make an abstraction that, inside webviews, only moves by one element when you press the down arrow, the d-pad on a controller or whatever. So, if you have a paragraph containing text, then an object replacement character, then a button or link, orca would read this in one big line as it's shown on the page, doing noncomplicated text replacement essentially. We, however, could make it so that we split that into three, first comes the text before the object replacement character, then the control which you can click on, then the text after the character, each being accessible by the user pressing the down arrow

luukvanderduim · 2025-01-03T13:08:14Z

I understand it a bit better now. Thanks!
Also I am happy to see this issue moved from atspi to Odilia because I suspect it is better solved in either the screen-reader or the TTS layer.

@TTWNO

Text returned from the org.a11y.atspi.Text interface is not just a String as per Rust rules.

The object replacement character is valid UTF-8, so it is a valid Rust string. Otherwise it could not have been constructed.

It contains additional contracts like that of how the object replacement character should be interpreted and how to interpret it.

Is the 'object replacement character' (ORC) the only character we know of that imposes a 'contract'?
It would be nice to exclude more surprises. Ask the firefox devs if there are more practices we should know of?

So if Odilia encounters a Text object, this may be a structured Text, like a tree.
What would be the rules?
Is the object the ORC replaces known to be the next (Text-) sibling to the current object?
You would need to keep track of lists of indexes that correspond to current objects. (I thought of ranges first, but sibling can be anything they want, not necessarily text objects)

First find out the exact rules, then find a structure to fit the rule? The above assumptions are worth nothing if wrong.
Apart from browsers, where else is this seen?

albertotirla · 2025-01-03T16:35:06Z

from what I know, only in webview elements, be that in the browser or otherwise, this is relevant. I believe that adding such a type with the logic I mentioned above directly to atspi is beneficial because other assistive technologies may want to treat embedded links and such the way we do, as separate from the text they're in. So, this is how this would work in practice:

the text gets split by the object replacement character
for each of the entries, we construct an enum as mentioned in my first comment, with the text variant
after each entry, we look at the index of the original object replacement character in the text, usually the length of the previous entry in characters +1, and then we use that to see what object should have been slotted in if we did the naive orca replacement strategy
since we got the accessible proxy of that object, we make an enum variant depending on the role, be that hyperlink or something else, then put it in next to the previous text entry we calculated

So then, one can either do this all at once, doing the replacement and so on for everything and then returning a vector, or lazily consuming the content via an iterator, producing items when it encounters the object replacement character, doing a split and making of that two items, the before and the object itself

TTWNO · 2025-01-03T18:08:53Z

I believe that adding such a type with the logic I mentioned above directly to atspi is beneficial because other assistive technologies may want to treat embedded links and such the way we do

It can be its own crate once we have a nice little module of code, but I don't even know which current atspi module would event apply.

Is the 'object replacement character' (ORC) the only character we know of that imposes a 'contract'?

Yes. Confirmed by asking Matthias Clasen. Although it should probably be spec'ed somewhere.

The above assumptions are worth nothing if wrong.

True!

Apart from browsers, where else is this seen?

Not sure if its relevant. The point is it exists; it shouldn't matter where it is.

TTWNO transferred this issue from odilia-app/atspi Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be careful with text from the `Text` interface. #161

Be careful with text from the `Text` interface. #161

TTWNO commented Dec 24, 2024

TTWNO commented Dec 24, 2024

TTWNO commented Dec 24, 2024 •

edited

Loading

TTWNO commented Dec 30, 2024

luukvanderduim commented Jan 2, 2025

TTWNO commented Jan 2, 2025

albertotirla commented Jan 3, 2025

TTWNO commented Jan 3, 2025

albertotirla commented Jan 3, 2025

luukvanderduim commented Jan 3, 2025

albertotirla commented Jan 3, 2025

TTWNO commented Jan 3, 2025

Be careful with text from the Text interface. #161

Be careful with text from the Text interface. #161

Comments

TTWNO commented Dec 24, 2024

TTWNO commented Dec 24, 2024

TTWNO commented Dec 24, 2024 • edited Loading

TTWNO commented Dec 30, 2024

luukvanderduim commented Jan 2, 2025

TTWNO commented Jan 2, 2025

albertotirla commented Jan 3, 2025

TTWNO commented Jan 3, 2025

albertotirla commented Jan 3, 2025

luukvanderduim commented Jan 3, 2025

albertotirla commented Jan 3, 2025

TTWNO commented Jan 3, 2025

Be careful with text from the `Text` interface. #161

Be careful with text from the `Text` interface. #161

TTWNO commented Dec 24, 2024 •

edited

Loading