Protocol for extending the variables pane #560

dfalbel · 2024-10-02T13:59:24Z

This PR is a proposal of a mechanism that allows package authors to implement custom behavior for objects in the variables pane.

A description of how the protocol works is available here: https://github.com/posit-dev/ark/blob/daf9348affc0f1b190928a2fb0cad5bfadaf263c/doc/variables-pane-extending.md

See also #561 for the custom inspect behavior.

Motivation

The main motivation is to allow package authors to improve the display of objects for which the internal R data structure does not show useful information for users. For example, reticulate R obejcts are environments containing a flag and an external pointer (x <- reticulate::py_eval("1", convert = FALSE)):

Or torch tensors (x = torch::torch_randn(10, 10)) :

Or xml2 documents (x <- xml2::read_xml("<foo> <bar> text <baz/> </bar> </foo>")):

How it's implemented

Package authors can implement a set of S3 like methods, each customizing a specific part of the UI.
Methods are discovered by ark when packages are loaded and lazily stored in an environment. Package authors don't need to export these methods in their packages. See here for the description of methods.

When responding to events from the variables pane, we try to apply the custom methods and if the method works correctly we use it's value, otherwise we try to continue using the default implementation.

Methods are expllicitly implemented for Ark, so it's expected that package authors will take extra care to make they work correctly with Positron.

Comparison to RStudio

RStudio uses the utils::str S3 in a few situations to obtain the values description, see usages of ValueFromStr in eg: https://github.com/rstudio/rstudio/blob/2175496934a54e21979461a1c67eb13ad5048cf2/src/cpp/session/modules/SessionEnvironment.R#L122

There are many guardrails though, as str methods are commonly implemented and package authors might not be aware of such bugs when used from RStudio.

lionel-

That seems like a good idea!

Before reviewing the implementation, I'd just like to discuss a potential simplification of the extension mechanism. What if we had a single generic ps_variable_proxy() that would be in charge of converting an object to a base type representation (data frame, vector, list, environment) that Ark would be able to display in the usual way. This is consistent with how genericity works in the tidyverse via vctrs proxies.

The display type could be set to the first element of the class vector, but if that's too inconvenient we could have an additional method for it as proposed here.

Laziness for deeply/infinitely nested objects:

Could be supported via environments. The only downside is that environments have the restriction of uniquely named elements, but that seems reasonable for object internals since that's consistent with struct-like types?
Or we could allow list-proxies containing classed objects that themselves have a proxy method.

What do you think?

dfalbel · 2024-10-03T11:48:11Z

I like the idea of a single ps_variable_proxy.

I think I'd still want to customize the display value though, eg for a torch_tensor, I think the display value should be something like Float [1:10, 1:10], I think the closest proxy would be a length 1 character vector with that text, which would still be displayed with quotes around it:"Float [1:10, 1:10]".

I'm fine with display type = class(x)[1], we could remove display_type.

So the proxy would replace has_children(), get_children() and get_child_at().

My main concern with a single proxy implementation is performance. The way the variables pane is currenty implemented, we don't keep any state after the representations are sent to the front-end and we only get back a vector of access keys from the front-end when we want to expand a node.

Considering we implement the second option where variable_proxy() returns a list-proxy containing classed objects implementing variable_proxy. So suppose that A is an object that implements the variable_proxy() and it's children are also objects that also implement variable_proxy() (For instance, A could be a reticulate object representing a Python dictionary and B a dictionary contained in A).

When we click on B, so we get it's children, we'll receive a request from the front-end containing [A, B].
To compute the children of B, we'll need to:

ps_variable_proxy(A)
find B there and then
ps.variable_proxy(B)

Root
└── A
    ├── B
    │   ├── B1
    │   └── B2
    └── C

If A is large, building it's variable_proxy() can be time consuming and we'll need to rebuild it for every. neste child one expands.

The main reason for having two separate get_children() and get_child() is that there might be an efficient way of finding B in A without building the entire list of possible children.

IMO this will be a rare situation though, except for reticulate, most use cases I can think would only want to show a nice display value. So maybe it's fine to let package authors guarantee that variable_proxy is fast enough, even if they need to implement some kind of caching, etc. Or maybe, we want to keep some state in the variables pane so subsequent requests don't need to recompute the children.

Let me know what you think!

lionel- · 2024-10-03T12:10:49Z

hmm if performance is an issue I think we should be able to store the proxies in our current_bindings? These would only be updated if the objects change across top-level calls.

t-kalinowski · 2024-10-03T12:16:05Z

I think we can add support for a ps_variable_proxy, which could be in addition to the methods proposed in the PR, as an alternative API. However, I would not want to require that the proxy approach is the only supported API.

The 'base-atomic-proxy' approach creates difficulties for objects that cannot be easily materialized as a base atomic. This is particularly problematic for objects that would be much larger in memory when materialized as a base atomic type, and for objects that have no suitable base-atomic equivalent.

For example, large sparse arrays or arrays with smaller dtypes pose challenges. It's not uncommon to have a single-digit GB-sized NumPy array with a 'float32' dtype. Materializing that as a 'base-atomic-proxy' would mean allocating a double-digit GB double array every time the variables pane is updated. Similarly, if a user has a 10 GB array of int8s, we'd be materializing a 40 GB R integer array with each variables pane update.

(We encounter these same limitations of the vctrs-proxy approach in other places, like rstudio/reticulate#1481)

Many objects also lack a suitable base atomic equivalent. This opens up tricky questions. For instance:

Python AST nodes: import ast; n = ast.Name('x'). Would this need to be presented as a string or an R symbol?
Python futures: fut = concurrent.futures.ThreadPoolExecutor().submit(fn). Would reticulate have to construct an R {coro} or {future} proxy?
Lazy-loaded arrays like https://www.bioconductor.org/packages/release/bioc/html/DelayedArray.html: How would the correct "true" dimensions of such an array be communicated without materializing a full proxy?

lionel- · 2024-10-03T12:45:29Z

In my mind the variable pane should never contain GBs of data and any materialised data should be truncated - to support large data we'd need a similar approach to the data viewer and then the proxy becomes a slice proxy.

I see your points regarding flexibility of data types in the context of interop. I think this aligns with Daniel's request to keep a display value extension point?

t-kalinowski · 2024-10-03T12:52:44Z

I think this aligns with Daniel's request to keep a display value extension point?

I'm probably misunderstanding, but I don't think it does. A code example might help.

With the proxy approach, what method(s) would reticulate implement to display a NumPy array?

dfalbel · 2024-10-03T13:06:32Z

I think for a numpy array, reticulate would only implement display_value and a proxy method that returns anything that's scalar atomic, so the arrow \/ to list children doesn't appear for it in the variables pane (equivalent to current has_children() = FALSE).

Storing proxies in current_bindings is probably possible, but the implementation will be quite tricky, because an object implementing variable_proxy() could be any level nested into an object that is pointed by one of the current_bindings.
I can take a look at what that could look like.

lionel- · 2024-10-03T14:18:34Z

@dfalbel Feel free to take a look but I think you've both convinced me that the proxy approach would not be a simplification.

dfalbel · 2024-10-03T17:04:38Z

Ok, I didn't get to it, but happy to experiment if you think it's worth it. I set up an example package here: https://github.com/dfalbel/testVariablesPaneExtension/tree/main/R with an example that works for R6 classes and some examples of what I'd like to make it work for torch.

lionel-

This is a really nice PR!
Approved but I think @DavisVaughan should also take a look.

I believe this is the very first API exposed from Ark to R packages (if we exclude rstudioapi support) so we should discuss what that should look like. I like the setup you've created to register methods but I wonder if:

.ps.register_ark_method("ark_variable_display_value", ...)

should have the opposite namespacing:

.ark.register_method("positron_variable_display_value", ...)

The variable pane is currently a positron-only feature so should be namespaced in positron_
I'm imagining the register_method() function will be common to both ark and positron features, so should be namespaced as such.

We probably should meet and discuss how to expose this API.

crates/ark/src/interface.rs

lionel- · 2024-10-09T07:50:34Z

crates/ark/src/variables/methods.rs

+use crate::modules::ARK_ENVS;
+
+#[derive(Debug, PartialEq, EnumString, EnumIter, IntoStaticStr, Display, Eq, Hash, Clone)]
+pub enum ArkGenerics {


Should be in its own module one level above?

Or renamed to ArkVariableGenerics.

Do you have a preference? It feels like this could be used elsewhere, so we could move to its own module above.

I've moved one level above in ff887ff . Happy to rever if you prefer making it specific for the variables pane.

I'm happy with this change, it seems like that infrastructure could be useful in the future.

lionel- · 2024-10-09T07:57:36Z

crates/ark/src/variables/methods.rs

+    fn parse_method(name: &String) -> Option<(Self, String)> {
+        for method in ArkGenerics::iter() {
+            let method_str: &str = method.clone().into();
+            if name.starts_with::<&str>(method_str) {
+                if let Some((_, class)) = name.split_once(".") {
+                    return Some((method, class.to_string()));
+                }
+            }
+        }
+        None
+    }
+}


lionel- · 2024-10-09T07:58:43Z

crates/ark/src/variables/methods.rs

+            Rf_lang3(
+                r_symbol!(":::"),
+                r_symbol!(package),
+                r_symbol!(format!("{generic}.{class}")),
+            )


You can replace this with RCall which works similarly to RFunction (the latter is implemented in terms of the former).

But why not pass a function to register_method()? To avoid inlining objects in R calls? If so needs a comment, the implicit evaluation is surprising.

IIRC, the motivation for storing a call pkgname:::methodname instead of the closure object was to ensure that:

Everything worked seamlessly with devtools::load_all() and similar functions (no risk of calling an outdated method from a stale cache);

An escape hatch remained available for monkey-patching a method in the package namespace (e.g., to enable workarounds for misbehaving methods).

I updated to use RCall in aadef2b

lionel- · 2024-10-09T08:02:43Z

crates/ark/src/variables/methods.rs

+        }
+    }
+
+    pub fn register_method(generic: Self, class: &str, method: RObject) -> anyhow::Result<()> {


Any reason not to take a consuming self (as opposed to &self)?

My reasoning was that a static method would make more clear that we're actually modfying some global state and not just an object state. But loooks weird indeed - fixed in 16d4ab8

crates/ark/src/variables/methods.rs

lionel- · 2024-10-09T08:28:21Z

crates/ark/src/variables/methods.rs

+        T: TryFrom<RObject>,
+        <T as TryFrom<harp::RObject>>::Error: std::fmt::Debug,
+    {
+        if !r_is_object(x) {


Since this is rather generic infrastructure, we might want to allow registering methods for base types in the future (only us would be allowed to)?

crates/ark/src/variables/methods.rs

crates/ark/src/variables/variable.rs

lionel- · 2024-10-09T08:48:04Z

crates/ark/src/variables/variable.rs

+    let kind: Option<String> = ArkGenerics::VariableKind.try_dispatch(value, vec![])?;
+    match kind {
+        None => Ok(None),
+        // Enum reflection is not beautiful, we want to parse a VariableKind from it's


I don't understand this note.

This was more like a rant :P. It's hard to go from the string representation to an enum value in rust. Here we had to create a json and then read it with serde.

dfalbel · 2024-10-09T13:36:10Z

Thanks @lionel-

I agree that we should have the opposite namespacing and just changed here: a21dc37

I'm happy to chat further on how to expose this API. Currently we don't necessarily need to expose .ark_register_method() as we already automatically find methods in packages, but it's nice for interactive debugging and for testing purposes.

DavisVaughan

Partial review - will do more in a bit

crates/ark/src/methods.rs

crates/ark/src/modules/positron/methods.R

crates/ark/src/methods.rs

crates/ark/src/variables/variable.rs

Co-authored-by: Lionel Henry <[email protected]>

…sitron_variables`.

- Changed naming scheme to include `ark_` prefix for all methods. - Reuse the methods table names to avoid duplication. - Use the `to_string()` method to generate the error messages.

Co-authored-by: Davis Vaughan <[email protected]>

dfalbel · 2024-10-18T15:23:07Z

As discussed, I have updated the PR with:

Removing auto-registration in favor of only explicit method registration.
Added an allowed list of packages, so we can test the API for some time before making it really public.

I have torch PR implementing how packages should explicitly register methods: mlverse/torch@faa11e4

t-kalinowski · 2024-11-11T17:19:41Z

@lionel- @DavisVaughan Can I ask why

automatic discovery of methods in packages was removed,
and only packages on the explicit allow list are allowed to register methods

I’d like to implement methods not only in reticulate but also in keras and tensorflow. The recent changes before merge feel like unnecessary added complexity, inconvenience, and communication overhead, for reasons I don’t fully understand.

lionel- · 2024-11-14T07:49:12Z

(1) is a choice for explicitness. This API seems sufficiently specific that magic and extra convenience did not feel warranted.
(2) is about not committing to any public interface in the short term until the mlverse had time to experiment with the API.

t-kalinowski · 2024-11-14T17:10:58Z

That makes sense, thanks.

wesm · 2024-12-03T19:26:04Z

I was reading through this since it was mentioned in posit-dev/positron#5573.

A side comment: I wanted to mention that in https://github.com/posit-dev/ark/blob/main/doc/variables-pane-extending.md, Positron is not mentioned at all. Ark contains a lot of code that powers features in Positron but not easy to use elsewhere. Since Ark is intended to be usable outside of Positron, it may make sense to more clearly separate out the Positron-specific code and documentation so that people don't get the wrong idea that Ark is only relevant to Positron.

This comment was marked as resolved.

Sign in to view

dfalbel force-pushed the feature/extend-variables branch from 47a46d8 to 8676851 Compare October 2, 2024 14:42

dfalbel mentioned this pull request Oct 2, 2024

Extending the variables pane - Inspect #561

Merged

dfalbel requested review from lionel- and DavisVaughan October 2, 2024 17:47

lionel- reviewed Oct 3, 2024

View reviewed changes

lionel- approved these changes Oct 9, 2024

View reviewed changes

DavisVaughan reviewed Oct 10, 2024

View reviewed changes

dfalbel and others added 11 commits October 17, 2024 16:23

Add methods registration and dispatching

d5c6ed7

Dispatch on methods before relying on defaults

63e52e3

add more supported methods

072c4dd

Use a strum enum to make less error prone method dispatching

dbfadc2

Favor let Some(x) instead of match

7f973e8

use let Ok() instead

02fcd4e

Move into an R based impl

302f9a5

minor revision

7473e49

typo

01eda82

Make a method instead

e0b073d

Add some unit tests

023a954

dfalbel and others added 21 commits October 17, 2024 16:25

fix tests

894c9db

Find namespaced method calls like foo:::bar

b24eaac

Remove has_ark_method

9bc5fc3

use r_task

588b733

Use if let Err construct

c3ee3c6

Use is_string helper

cffc4ea

USe cls for consistency

7997c48

Improve readability

bd26e53

Co-authored-by: Lionel Henry <[email protected]>

Simplify function

d68648d

Co-authored-by: Lionel Henry <[email protected]>

Simplify note

a69c736

Use &self instead

87cd6da

Use RCall instead

2cad01e

Use RArgument

c8a5634

Add note on why we store a call

ea9c749

Use different namespacing. .ps -> .ark and ark_variables -> `po…

17ebe9a

…sitron_variables`.

Move methods.rs to it's own module one level above.

9eebd2b

Method naming

1ef4d1a

- Changed naming scheme to include `ark_` prefix for all methods. - Reuse the methods table names to avoid duplication. - Use the `to_string()` method to generate the error messages.

Typos

c6e52bc

Co-authored-by: Davis Vaughan <[email protected]>

Require explicit registration

2ee62d7

Rename to .ark.register_method()

ee2f34e

Use an allow list of packages

e6e2e18

dfalbel force-pushed the feature/extend-variables branch from 96e9917 to e6e2e18 Compare October 17, 2024 19:26

dfalbel merged commit 4bb6ca9 into main Oct 23, 2024
6 checks passed

dfalbel deleted the feature/extend-variables branch October 23, 2024 12:20

github-actions bot locked and limited conversation to collaborators Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protocol for extending the variables pane #560

Protocol for extending the variables pane #560

dfalbel commented Oct 2, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

lionel- left a comment

dfalbel commented Oct 3, 2024

lionel- commented Oct 3, 2024

t-kalinowski commented Oct 3, 2024

lionel- commented Oct 3, 2024

t-kalinowski commented Oct 3, 2024

dfalbel commented Oct 3, 2024

lionel- commented Oct 3, 2024

dfalbel commented Oct 3, 2024 •

edited

Loading

lionel- left a comment

lionel- Oct 9, 2024

dfalbel Oct 9, 2024

dfalbel Oct 9, 2024

lionel- Oct 10, 2024

lionel- Oct 9, 2024

lionel- Oct 9, 2024

lionel- Oct 9, 2024

t-kalinowski Oct 9, 2024

dfalbel Oct 9, 2024

lionel- Oct 9, 2024

dfalbel Oct 9, 2024

lionel- Oct 9, 2024

lionel- Oct 9, 2024

dfalbel Oct 9, 2024

dfalbel commented Oct 9, 2024

DavisVaughan left a comment

dfalbel commented Oct 18, 2024

t-kalinowski commented Nov 11, 2024 •

edited

Loading

lionel- commented Nov 14, 2024

t-kalinowski commented Nov 14, 2024

wesm commented Dec 3, 2024

Protocol for extending the variables pane #560

Protocol for extending the variables pane #560

Conversation

dfalbel commented Oct 2, 2024 • edited Loading

Motivation

How it's implemented

Comparison to RStudio

This comment was marked as resolved.

This comment was marked as resolved.

lionel- left a comment

Choose a reason for hiding this comment

dfalbel commented Oct 3, 2024

lionel- commented Oct 3, 2024

t-kalinowski commented Oct 3, 2024

lionel- commented Oct 3, 2024

t-kalinowski commented Oct 3, 2024

dfalbel commented Oct 3, 2024

lionel- commented Oct 3, 2024

dfalbel commented Oct 3, 2024 • edited Loading

lionel- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfalbel commented Oct 9, 2024

DavisVaughan left a comment

Choose a reason for hiding this comment

dfalbel commented Oct 18, 2024

t-kalinowski commented Nov 11, 2024 • edited Loading

lionel- commented Nov 14, 2024

t-kalinowski commented Nov 14, 2024

wesm commented Dec 3, 2024

dfalbel commented Oct 2, 2024 •

edited

Loading

dfalbel commented Oct 3, 2024 •

edited

Loading

t-kalinowski commented Nov 11, 2024 •

edited

Loading