-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protocol for extending the variables pane #560
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
47a46d8
to
8676851
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a good idea!
Before reviewing the implementation, I'd just like to discuss a potential simplification of the extension mechanism. What if we had a single generic ps_variable_proxy()
that would be in charge of converting an object to a base type representation (data frame, vector, list, environment) that Ark would be able to display in the usual way. This is consistent with how genericity works in the tidyverse via vctrs proxies.
The display type could be set to the first element of the class vector, but if that's too inconvenient we could have an additional method for it as proposed here.
Laziness for deeply/infinitely nested objects:
-
Could be supported via environments. The only downside is that environments have the restriction of uniquely named elements, but that seems reasonable for object internals since that's consistent with struct-like types?
-
Or we could allow list-proxies containing classed objects that themselves have a proxy method.
What do you think?
I like the idea of a single ps_variable_proxy. I think I'd still want to customize the display value though, eg for a torch_tensor, I think the display value should be something like I'm fine with So the proxy would replace My main concern with a single proxy implementation is performance. The way the variables pane is currenty implemented, we don't keep any state after the representations are sent to the front-end and we only get back a vector of access keys from the front-end when we want to expand a node. Considering we implement the second option where variable_proxy() returns a list-proxy containing classed objects implementing When we click on B, so we get it's children, we'll receive a request from the front-end containing [A, B].
If The main reason for having two separate IMO this will be a rare situation though, except for reticulate, most use cases I can think would only want to show a nice display value. So maybe it's fine to let package authors guarantee that variable_proxy is fast enough, even if they need to implement some kind of caching, etc. Or maybe, we want to keep some state in the variables pane so subsequent requests don't need to recompute the children. Let me know what you think! |
hmm if performance is an issue I think we should be able to store the proxies in our |
I think we can add support for a The 'base-atomic-proxy' approach creates difficulties for objects that cannot be easily materialized as a base atomic. This is particularly problematic for objects that would be much larger in memory when materialized as a base atomic type, and for objects that have no suitable base-atomic equivalent. For example, large sparse arrays or arrays with smaller dtypes pose challenges. It's not uncommon to have a single-digit GB-sized NumPy array with a 'float32' dtype. Materializing that as a 'base-atomic-proxy' would mean allocating a double-digit GB double array every time the variables pane is updated. Similarly, if a user has a 10 GB array of int8s, we'd be materializing a 40 GB R integer array with each variables pane update. (We encounter these same limitations of the vctrs-proxy approach in other places, like rstudio/reticulate#1481) Many objects also lack a suitable base atomic equivalent. This opens up tricky questions. For instance:
|
In my mind the variable pane should never contain GBs of data and any materialised data should be truncated - to support large data we'd need a similar approach to the data viewer and then the proxy becomes a slice proxy. I see your points regarding flexibility of data types in the context of interop. I think this aligns with Daniel's request to keep a display value extension point? |
I'm probably misunderstanding, but I don't think it does. A code example might help. With the proxy approach, what method(s) would reticulate implement to display a NumPy array? |
I think for a numpy array, reticulate would only implement Storing proxies in |
@dfalbel Feel free to take a look but I think you've both convinced me that the proxy approach would not be a simplification. |
Ok, I didn't get to it, but happy to experiment if you think it's worth it. I set up an example package here: https://github.com/dfalbel/testVariablesPaneExtension/tree/main/R with an example that works for R6 classes and some examples of what I'd like to make it work for torch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice PR!
Approved but I think @DavisVaughan should also take a look.
I believe this is the very first API exposed from Ark to R packages (if we exclude rstudioapi support) so we should discuss what that should look like. I like the setup you've created to register methods but I wonder if:
.ps.register_ark_method("ark_variable_display_value", ...)
should have the opposite namespacing:
.ark.register_method("positron_variable_display_value", ...)
- The variable pane is currently a positron-only feature so should be namespaced in
positron_
- I'm imagining the
register_method()
function will be common to both ark and positron features, so should be namespaced as such.
We probably should meet and discuss how to expose this API.
crates/ark/src/variables/methods.rs
Outdated
use crate::modules::ARK_ENVS; | ||
|
||
#[derive(Debug, PartialEq, EnumString, EnumIter, IntoStaticStr, Display, Eq, Hash, Clone)] | ||
pub enum ArkGenerics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in its own module one level above?
Or renamed to ArkVariableGenerics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a preference? It feels like this could be used elsewhere, so we could move to its own module above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved one level above in ff887ff . Happy to rever if you prefer making it specific for the variables pane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this change, it seems like that infrastructure could be useful in the future.
crates/ark/src/variables/methods.rs
Outdated
fn parse_method(name: &String) -> Option<(Self, String)> { | ||
for method in ArkGenerics::iter() { | ||
let method_str: &str = method.clone().into(); | ||
if name.starts_with::<&str>(method_str) { | ||
if let Some((_, class)) = name.split_once(".") { | ||
return Some((method, class.to_string())); | ||
} | ||
} | ||
} | ||
None | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
crates/ark/src/variables/methods.rs
Outdated
Rf_lang3( | ||
r_symbol!(":::"), | ||
r_symbol!(package), | ||
r_symbol!(format!("{generic}.{class}")), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace this with RCall
which works similarly to RFunction
(the latter is implemented in terms of the former).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why not pass a function to register_method()
? To avoid inlining objects in R calls? If so needs a comment, the implicit evaluation is surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the motivation for storing a call pkgname:::methodname
instead of the closure object was to ensure that:
- Everything worked seamlessly with
devtools::load_all()
and similar functions (no risk of calling an outdated method from a stale cache); - An escape hatch remained available for monkey-patching a method in the package namespace (e.g., to enable workarounds for misbehaving methods).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to use RCall
in aadef2b
crates/ark/src/variables/methods.rs
Outdated
} | ||
} | ||
|
||
pub fn register_method(generic: Self, class: &str, method: RObject) -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to take a consuming self
(as opposed to &self
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was that a static method would make more clear that we're actually modfying some global state and not just an object state. But loooks weird indeed - fixed in 16d4ab8
crates/ark/src/variables/methods.rs
Outdated
T: TryFrom<RObject>, | ||
<T as TryFrom<harp::RObject>>::Error: std::fmt::Debug, | ||
{ | ||
if !r_is_object(x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is rather generic infrastructure, we might want to allow registering methods for base types in the future (only us would be allowed to)?
crates/ark/src/variables/variable.rs
Outdated
let kind: Option<String> = ArkGenerics::VariableKind.try_dispatch(value, vec![])?; | ||
match kind { | ||
None => Ok(None), | ||
// Enum reflection is not beautiful, we want to parse a VariableKind from it's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was more like a rant :P. It's hard to go from the string representation to an enum value in rust. Here we had to create a json and then read it with serde.
Thanks @lionel- I agree that we should have the opposite namespacing and just changed here: a21dc37 I'm happy to chat further on how to expose this API. Currently we don't necessarily need to expose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review - will do more in a bit
Co-authored-by: Lionel Henry <[email protected]>
Co-authored-by: Lionel Henry <[email protected]>
…sitron_variables`.
- Changed naming scheme to include `ark_` prefix for all methods. - Reuse the methods table names to avoid duplication. - Use the `to_string()` method to generate the error messages.
Co-authored-by: Davis Vaughan <[email protected]>
96e9917
to
e6e2e18
Compare
As discussed, I have updated the PR with:
I have torch PR implementing how packages should explicitly register methods: mlverse/torch@faa11e4 |
@lionel- @DavisVaughan Can I ask why
I’d like to implement methods not only in reticulate but also in keras and tensorflow. The recent changes before merge feel like unnecessary added complexity, inconvenience, and communication overhead, for reasons I don’t fully understand. |
(1) is a choice for explicitness. This API seems sufficiently specific that magic and extra convenience did not feel warranted. |
That makes sense, thanks. |
I was reading through this since it was mentioned in posit-dev/positron#5573. A side comment: I wanted to mention that in https://github.com/posit-dev/ark/blob/main/doc/variables-pane-extending.md, Positron is not mentioned at all. Ark contains a lot of code that powers features in Positron but not easy to use elsewhere. Since Ark is intended to be usable outside of Positron, it may make sense to more clearly separate out the Positron-specific code and documentation so that people don't get the wrong idea that Ark is only relevant to Positron. |
This PR is a proposal of a mechanism that allows package authors to implement custom behavior for objects in the variables pane.
A description of how the protocol works is available here: https://github.com/posit-dev/ark/blob/daf9348affc0f1b190928a2fb0cad5bfadaf263c/doc/variables-pane-extending.md
See also #561 for the custom inspect behavior.
Motivation
The main motivation is to allow package authors to improve the display of objects for which the internal R data structure does not show useful information for users. For example, reticulate R obejcts are environments containing a flag and an external pointer (
x <- reticulate::py_eval("1", convert = FALSE)
):Or torch tensors (
x = torch::torch_randn(10, 10)
) :Or xml2 documents (
x <- xml2::read_xml("<foo> <bar> text <baz/> </bar> </foo>")
):How it's implemented
Package authors can implement a set of S3 like methods, each customizing a specific part of the UI.
Methods are discovered by ark when packages are loaded and lazily stored in an environment. Package authors don't need to export these methods in their packages. See here for the description of methods.
When responding to events from the variables pane, we try to apply the custom methods and if the method works correctly we use it's value, otherwise we try to continue using the default implementation.
Methods are expllicitly implemented for Ark, so it's expected that package authors will take extra care to make they work correctly with Positron.
Comparison to RStudio
RStudio uses the
utils::str
S3 in a few situations to obtain the values description, see usages of ValueFromStr in eg: https://github.com/rstudio/rstudio/blob/2175496934a54e21979461a1c67eb13ad5048cf2/src/cpp/session/modules/SessionEnvironment.R#L122There are many guardrails though, as
str
methods are commonly implemented and package authors might not be aware of such bugs when used from RStudio.