Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be careful with text from the Text interface. #161

Open
TTWNO opened this issue Dec 24, 2024 · 11 comments
Open

Be careful with text from the Text interface. #161

TTWNO opened this issue Dec 24, 2024 · 11 comments

Comments

@TTWNO
Copy link
Member

TTWNO commented Dec 24, 2024

Text returned from the org.a11y.atspi.Text interface is not just a String as per Rust rules.

It contains additional contracts like that of how the object replacement character should be interpreted and how to interpret it.

These contracts should be reflected in the API, even if that means using a newtype. Technically, most of the usefulness of this is for screen readers (AT-SPI consumers), and is not directly related to any work done for providers, but it will change the public API for using the interfaces from the atspi-proxies crate (both sending and receiving).

@TTWNO
Copy link
Member Author

TTWNO commented Dec 24, 2024

A more clear example goes as follows:

  • I open firefox
  • I go to the address bar, then begin typing odilia.app, odilia doesn't read the autocomplete as it appears
  • I try pressing tab, the first link reads, the second does not
  • Search for the term "menu item" in the Odilia log, and you'll see there are some characters that will likely render as spaces in your terminal.
  • Pipe it through xxd to find the real byte sequence.
$ echo ' menu item' | xxd
00000000: efbf bcef bfbc efbf bcef bfbc efbf bcef  ................
00000010: bfbc efbf bcef bfbc 206d 656e 7520 6974  ........ menu it
00000020: 656d 0a                                  em.

@TTWNO
Copy link
Member Author

TTWNO commented Dec 24, 2024

You can use the start_index and end_index methods on the org.a11y.atspi.Hyperlink interface of individual children to find where they belong in the set of object replacement characters of the parent.

Yes, this applies even when they are not strictly a hyperlink; I didn't name the interface. If you put a table inside of a paragraph on a web page, this is the same mechanism that it will use to substitute within the text.

Test page (a paragraph starting with "Para", a table with "hi" and "bye" as its only cells, then "graph"; finally, the paragraph tag closes):

<p>Para <table style="display:inline;"><td>Hi</td><td>Bye</td></table> graph</p>

Results in this text being returned from AT-SPI:

$ echo 'Para  graph' | xxd
00000000: 5061 7261 20ef bfbc 2067 7261 7068 0a    Para ... graph.

This is not something that is clear through the API, and should at least be indicated by documentation, but ideally by a String-like newtype.

@TTWNO
Copy link
Member Author

TTWNO commented Dec 30, 2024

cc @luukvanderduim any thoughts on this?

@luukvanderduim
Copy link
Contributor

Mainly questions, because I am still a bit confused.

  • Is it a valid method return?

org.a11y.atspi.Text::GetText returns 's'

D-Bus is specific on what is considered a 'string-like':
"The string-like types are basic types with a variable length. The value of any string-like type is conceptually 0 or more Unicode codepoints encoded in UTF-8, none of which may be U+0000. The UTF-8 text must be validated strictly: in particular, it must not contain overlong sequences or codepoints above U+10FFFF"

I don't think zvariant would deserialize non-UTF-8 as UTF-8. Unless, eg. unvalidated &[u8] is transmuted.

So I assume the unread codepoints are UTF-8 but not vocalizable?

Then there is a higher level questions. Is it html found in the address bar or is that just your example? Does the address bar object expose HyperLink as well as Text?
If so, should Odilia call GetUri?

Do Odilia users require conversion from non-vocalizable UTF-8 to something that is vocalizable?

@TTWNO
Copy link
Member Author

TTWNO commented Jan 2, 2025

So I assume the unread codepoints are UTF-8 but not vocalizable?

Yes. The codepoints are multiple instances of the object replacement character.

Is it html found in the address bar or is that just your example?

Not sure what you mean by this question, but both the example with the table and the address bar example (given by @albertotirla) contain "inline children"; the object replacement character is to denote where the children are in the text. This means they (or at least their children) implement the Hyperlink interface (regardless if it's HTML or not).

If so, should Odilia call GetUri?

This ignores the table example, where although the hyperlink interface is available, GetUrl will not return any useful information. It's a poor abstraction but the only one we have of finding out the location of children within the widgets.

Do Odilia users require conversion from non-vocalizable UTF-8 to something that is vocalizable?

Yes. You can not vocalize these codepoints directly. At minimum, we'd have to use a descriptive table of these symbols so they can be pronounced by Speech Dispatcher. There may be a function to do this in speech dispatcher, not sure yet.

@albertotirla
Copy link
Member

and now we find out why screenreaders use virtual buffers, especially when it comes to web pages and even more so as regards when the apps are normal browser pages and require browse mode. Perhaps we should do the same, updating that buffer as we recieve more information?

as to the atspi crate, I don't think we should newtype String just for this, because the text interface still returns normal text most of the time, in edit boxes, in the terminal, etc. A better idea would be to create a type specifically for this purpose, which you construct by giving it ownership of the accessible proxy you're working with, and which then gives you an iterator (or stream) with the Item type something like this, note that the code does not compile:

enum InlineObjectKind <'a>{
    Text(&'a str),
    Hyperlink(AccessibleProxy<'a>,
    Other(AtspiRole, AccessibleProxy<'a>,
   }

and then we, on the odilia side, could build a virtual buffer abstraction which iterates through this, going to the next item every time one presses down arrow, or invokes the next item command when we'll have those abstractions open to input handlers. What do you think?

@TTWNO
Copy link
Member Author

TTWNO commented Jan 3, 2025

Can you expand on what you mean by virtual buffers?

@albertotirla
Copy link
Member

basically, a virtual buffer is a screenreader specific construct in memory which can present slightly altered views of what the accessibility tree shows. For example, as far as I know orca doesn't have one, so that's why it can only show widgets inline, which makes a huge amount of things be able to fit in one line, but unfortunately that's confusing in almost all cases. Instead, we can use the screenreader's knowledge of how the user wants things shown to build an in-memory representation of it, for example splitting the string by object replacement characters and making an iterator as I said, and then we can make an abstraction that, inside webviews, only moves by one element when you press the down arrow, the d-pad on a controller or whatever. So, if you have a paragraph containing text, then an object replacement character, then a button or link, orca would read this in one big line as it's shown on the page, doing noncomplicated text replacement essentially. We, however, could make it so that we split that into three, first comes the text before the object replacement character, then the control which you can click on, then the text after the character, each being accessible by the user pressing the down arrow

@TTWNO TTWNO transferred this issue from odilia-app/atspi Jan 3, 2025
@luukvanderduim
Copy link
Contributor

I understand it a bit better now. Thanks!
Also I am happy to see this issue moved from atspi to Odilia because I suspect it is better solved in either the screen-reader or the TTS layer.

@TTWNO

Text returned from the org.a11y.atspi.Text interface is not just a String as per Rust rules.

The object replacement character is valid UTF-8, so it is a valid Rust string. Otherwise it could not have been constructed.

It contains additional contracts like that of how the object replacement character should be interpreted and how to interpret it.

Is the 'object replacement character' (ORC) the only character we know of that imposes a 'contract'?
It would be nice to exclude more surprises. Ask the firefox devs if there are more practices we should know of?

So if Odilia encounters a Text object, this may be a structured Text, like a tree.
What would be the rules?
Is the object the ORC replaces known to be the next (Text-) sibling to the current object?
You would need to keep track of lists of indexes that correspond to current objects. (I thought of ranges first, but sibling can be anything they want, not necessarily text objects)

First find out the exact rules, then find a structure to fit the rule? The above assumptions are worth nothing if wrong.
Apart from browsers, where else is this seen?

@albertotirla
Copy link
Member

from what I know, only in webview elements, be that in the browser or otherwise, this is relevant. I believe that adding such a type with the logic I mentioned above directly to atspi is beneficial because other assistive technologies may want to treat embedded links and such the way we do, as separate from the text they're in. So, this is how this would work in practice:

  • the text gets split by the object replacement character
  • for each of the entries, we construct an enum as mentioned in my first comment, with the text variant
  • after each entry, we look at the index of the original object replacement character in the text, usually the length of the previous entry in characters +1, and then we use that to see what object should have been slotted in if we did the naive orca replacement strategy
  • since we got the accessible proxy of that object, we make an enum variant depending on the role, be that hyperlink or something else, then put it in next to the previous text entry we calculated

So then, one can either do this all at once, doing the replacement and so on for everything and then returning a vector, or lazily consuming the content via an iterator, producing items when it encounters the object replacement character, doing a split and making of that two items, the before and the object itself

@TTWNO
Copy link
Member Author

TTWNO commented Jan 3, 2025

I believe that adding such a type with the logic I mentioned above directly to atspi is beneficial because other assistive technologies may want to treat embedded links and such the way we do

It can be its own crate once we have a nice little module of code, but I don't even know which current atspi module would event apply.

Is the 'object replacement character' (ORC) the only character we know of that imposes a 'contract'?

Yes. Confirmed by asking Matthias Clasen. Although it should probably be spec'ed somewhere.

The above assumptions are worth nothing if wrong.

True!

Apart from browsers, where else is this seen?

Not sure if its relevant. The point is it exists; it shouldn't matter where it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants