Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install Python auto-complete helper files (*.pyi) #16987

Closed
jwnimmer-tri opened this issue Apr 18, 2022 · 14 comments
Closed

Install Python auto-complete helper files (*.pyi) #16987

jwnimmer-tri opened this issue Apr 18, 2022 · 14 comments
Assignees
Labels
component: distribution Nightly binaries, monthly releases, docker, installation priority: medium

Comments

@jwnimmer-tri
Copy link
Collaborator

See pybind/pybind11#2350 (comment).

As I understand it, some IDEs will not help the user auto-complete their function-call arguments unless we also install *.pyi files for our native modules.

As part of our build process, we should import mypy (maybe python3-mypy if it's new enough, otherwise the wheel) and then invoke it to create pyi files, and then install them alongside our modules.

@jwnimmer-tri jwnimmer-tri added component: distribution Nightly binaries, monthly releases, docker, installation priority: low unused team: kitware labels Apr 18, 2022
@svenevs
Copy link
Contributor

svenevs commented Apr 18, 2022

Correct. Where VSCode / pylance are concerned, they want the .pyi files installed: microsoft/pylance-release#242 (comment)

@jwnimmer-tri
Copy link
Collaborator Author

The pybind/pybind11#2350 (comment) linked in the OP gives two possible tools to solve this: pybind11-stubgen and mypy.stubgen. The goal here is to evaluate and choose one of them and integrate it into Drake's build system.

It's worth noting that we don't need a 100% perfect solution. Tab-completing 90% of the API is much better than 0%. It does run the risk of misleading users into thinking those last 10% of functions don't exist, but that seems like fair trade-off.

We should feel free to open upstream PRs to whichever tool we use to make its output better.

Perhaps the first step would be to manually run each of those two tools against a built pydrake, and write a little summary about which one works best out of the gate. Then we can debate which path to go.

@jwnimmer-tri
Copy link
Collaborator Author

Update: per f2f, mypy doesn't seem to do well at all; stubgen seems promising.

stubgen seems to crash and burn when the C++ type signatures are imported in the wrong order (a function B that uses a class A as an argument or return type but we haven't imported A yet), so we'll need to fix that (with py::import("foo") in the *_py.cc files, which we should have anyway) before we get a better readout.

@mwoehlke-kitware
Copy link
Contributor

mypy doesn't seem to do well at all

It ran beautifully, and has some obvious advantages. However, it doesn't seem to have emitted any of the things that pybind11-stubgen complains about. It didn't spit out any errors/warnings, so I don't know if it's just missing things, or if it's ignoring things that are "broken". In any case, its output seems, at first blush, incomplete.

(Note: for clarity, we should say pybind11-stubgen, as my understanding is that the mypy tool is properly called mypy-stubgen. Indeed, the binary from mypy is actually just stubgen.)

stubgen seems to crash and burn when the C++ type signatures are imported in the wrong order

Example:

[2022-07-07 10:57:24,645] {__init__.py:131} ERROR - Generated stubs signature is degraded to `(*args, **kwargs) -> typing.Any` for
[2022-07-07 10:57:24,645] {__init__.py:135} ERROR - def if_then_else(f_cond: drake::symbolic::Formula, e_then: pydrake.symbolic.Expression, e_else: pydrake.symbolic.Expression) -> pydrake.symbolic.Expression: ...
[2022-07-07 10:57:24,645] {__init__.py:136} ERROR -                               ^-- Invalid syntax
[2022-07-07 10:57:24,654] {__init__.py:957} INFO - Useful link: Avoiding C++ types in docstrings:
[2022-07-07 10:57:24,654] {__init__.py:958} INFO -       https://pybind11.readthedocs.io/en/latest/advanced/misc.html#avoiding-cpp-types-in-docstrings

@mwoehlke-kitware
Copy link
Contributor

Here's some more detailed analysis, starting with what stubgen (mypy) generated for pydrake.common.containers:

from _typeshed import Incomplete

class _EqualityProxyBase:
    def __init__(self, value) -> None: ...
    def __hash__(self): ...
    def __eq__(self, other): ...
    value: Incomplete

class _DictKeyWrap(dict):
    def __init__(self, dict_in, key_wrap, key_unwrap) -> None: ...
    def __setitem__(self, key, value): ...
    def __getitem__(self, key): ...
    def __delitem__(self, key): ...
    def __contains__(self, key): ...
    def items(self): ...
    def keys(self): ...
    def raw(self): ...

class EqualToDict(_DictKeyWrap):
    def __init__(self, *args, **kwargs): ...

class NamedViewBase:
    def __init__(self, value) -> None: ...
    @classmethod
    def get_fields(cls): ...
    def __getitem__(self, i): ...
    def __setitem__(self, i, value_i) -> None: ...
    def __setattr__(self, name, value) -> None: ...
    def __len__(self): ...
    def __iter__(self): ...
    def __array__(self): ...

def namedview(name, fields): ...

And here's what pybind11-stubgen generated:

"""Provides extensions for containers of Drake-related objects."""
from __future__ import annotations
import pydrake.common.containers
import typing
import numpy
_Shape = typing.Tuple[int, ...]

__all__ = [
    "EqualToDict",
    "NamedViewBase",
    "namedview",
    "np"
]


class _DictKeyWrap(dict):
    pass
class NamedViewBase():
    """
    Base for classes generated by ``namedview``.

        Inspired by: https://bitbucket.org/ericvsmith/namedlist
        
    """
    _fields = None
    pass
class EqualToDict(_DictKeyWrap, dict):
    """
    Implements a dictionary where keys are compared using type and
        `lhs.EqualTo(rhs)`.
        
    """
    pass
class _EqualityProxyBase():
    @property
    def value(self) -> None:
        """
        :type: None
        """
    pass

Overall, it seems like pybind11-stubgen does better at some type resolution (and, usefully, reports problems), while mypy is finding more things, especially class members. (Previous findings were based on mypy's claim to ingest .sos directly, but that appears to work somewhat poorly.) However, while mypy reports an ability to use reST documentation to improve things, it needs to be pointed at those artifacts separately, whereas pybind11-stubgen is able to include documentation without assistance.

pybind11-stubgen may also be significantly faster (<2s, including Bazel overhead, compared to mypy's ~15s run directly).

@jwnimmer-tri
Copy link
Collaborator Author

Just to update the timeline here -- #17521 added the new dependencies necessary for pyi generation to the workspace. It's smoke test was timing out in CI (from a mypy timeout, not a bazel timeout), so we've disabled the test for now in #17549.

The next step is probably to figure out why it was timing out and correct that, as well as moving pyi generation from test-time to build-time via a genrule() or rule().

@mwoehlke-kitware
Copy link
Contributor

Some notes on our testing plan:

  1. The .pyi files should be part of the //bindings/pydrake py_library. Accordingly, there should be a "comprehensive"¹ test which consumes //bindings/pydrake and internal mypy in order to actually validate that the .pyis are seen and are usable. Probably this test should attempt to validate one or more sample annotated files, at least one of which is known to be invalid.
  2. An install test should validate that some [sub]set of .pyi files is installed to the expected locations.
  3. Wheel tests should perform the same validations as either (1) or (2). In the former case, pip install mypy should be used.

(¹ "Comprehensive", here, doesn't necessarily mean actually comprehensive, just that this test would be the most comprehensive of the set.)

@jwnimmer-tri
Copy link
Collaborator Author

Quick update from f2f: some classnames (with square brackets) are unfriendly to pypi. As a first milestone to reach master, we'll try omitting those from the pyi files just to get something shipped. We might have only 10% of our API in the pyi files, but at least we'll get the build & release mechanics sorted out. Then in a second milestone, we can work on getting more of the API covered.

Kitware is going to write up a longer post detailing some of the problems.

@svenevs
Copy link
Contributor

svenevs commented Aug 8, 2022

@mwoehlke-kitware likely has additional information to share but I will try and give the full picture as I understand it. I've included information on how to test Visual Studio Code yourself for those wanting to try it out.

TL;DR: it may be OK to just start installing broken stubs. If done, it would be a good idea to document that it can be helpful for your IDE but (a) there are some big GOTCHAs to be aware of and (b) pin an issue explaining that we're prioritizing editor assists and mypy against code consuming drake will be broken (until fixed).

The Main Hurdle

Lets choose a specific example to represent the overall issue, math::RigidTransform. The result of its bindings will produce a trampoline class for us that also has a dictionary we can use to obtain new classes, the following all end up getting bound:

>>> import pydrake.math as dm
>>> print("\n".join(name for name in dir(dm) if "rigidtransform" in name.lower()))                                                                      
RigidTransform
RigidTransform_
RigidTransform_[AutoDiffXd]
RigidTransform_[Expression]
RigidTransform_[float]
_TemporaryName_N5drake4math14RigidTransformIN5Eigen14AutoDiffScalarINS2_6MatrixIdLin1ELi1ELi0ELin1ELi1EEEEEEE
_TemporaryName_N5drake4math14RigidTransformINS_8symbolic10ExpressionEEE
_TemporaryName_N5drake4math14RigidTransformIdEE
_TemporaryName_N5drake5ValueINS_4math14RigidTransformIN5Eigen14AutoDiffScalarINS3_6MatrixIdLin1ELi1ELi0ELin1ELi1EEEEEEEEE
_TemporaryName_N5drake5ValueINS_4math14RigidTransformINS_8symbolic10ExpressionEEEEE
_TemporaryName_N5drake5ValueINS_4math14RigidTransformIdEEEE
_TemporaryName_N5drake5ValueISt6vectorINS_4math14RigidTransformIN5Eigen14AutoDiffScalarINS4_6MatrixIdLin1ELi1ELi0ELin1ELi1EEEEEEESaIS9_EEEE
_TemporaryName_N5drake5ValueISt6vectorINS_4math14RigidTransformINS_8symbolic10ExpressionEEESaIS6_EEEE
_TemporaryName_N5drake5ValueISt6vectorINS_4math14RigidTransformIdEESaIS4_EEEE

These classes in particular create an issue for us:

  • RigidTransform_[AutoDiffXd]
  • RigidTransform_[Expression]
  • RigidTransform_[float]

Additionally,

// Declare default instantiation if it does not already exist.
if (!py::hasattr(scope, default_name.c_str())) {
scope.attr(default_name.c_str()) = py_class;
}

means that RigidTransform and RgidTransform_[float] are the same in this case, but in general we can't know with certainty that the [float] api is the default instance. As a result of the names binding strategy stubgen will produce something like the following:

# snippets from pydrake/math.pyi
# ... imports ...
RigidTransform_: pydrake.common.cpp_template.TemplateClass

# ...

class RigidTransform:
    # ...
    cast: Any
    # NOTE: the "default" is not internally self consistent, as far as
    # the hints are concerned RigidTransform != RigidTransform_[float]
    def InvertAndCompose(self, other: RigidTransform_[float]) -> RigidTransform_[float]: ...
    # ...
    # Only listing cast once, but this definition is invalid syntax for python
    # and affects every class that binds the casting operations.
    def cast[AutoDiffXd](self, *args, **kwargs) -> Any: ...
    def cast[Expression](self, *args, **kwargs) -> Any: ...
    def cast[float](self, *args, **kwargs) -> Any: ...

class RigidTransform_[AutoDiffXd]:
    # ...
    def InvertAndCompose(self, other: RigidTransform_[AutoDiffXd]) -> RigidTransform_[AutoDiffXd]: ...

class RigidTransform_[Expression]:
    # ...
    def InvertAndCompose(self, other: RigidTransform_[Expression]) -> RigidTransform_[Expression]: ...

class RigidTransform_[float]:
    # ...
    def InvertAndCompose(self, other: RigidTransform_[float]) -> RigidTransform_[float]: ...

These are not actually valid class names in the sense that you would not be able to type this code in a REPL and have it run (the [brackets] are not supposed to be allowed in class or function names, it only works because the module effectively has setattr done on it by the binding process somewhere). As a result, anytime that mypy encounters a class name or function name with [brackets] in it, it will crash:

/tmp/drake_pyi_install/lib/python3.8/site-packages/pydrake/geometry.pyi:100: error: invalid syntax
Found 1 error in 1 file (errors prevented further checking)

Experimentation

A "hackaway" approach was tested just to see how far we could get. Something about mypy skipping unrecognized decorators, possibly we need to use @overload, but this at least enables it to get valid syntax.

find "${dest_site_packages}/pydrake" -name '*.pyi' -print0 | \
xargs -0 -n1 sed -i -r \
    -e 's/^(\s*)(class|def) (\w+)[[]([][a-zA-Z0-9_., ]+)[]]/\1@instantiation(\4)\n\1\2 \3/' \
    -e '/\w+<\w+[)]/d' \
    -e '/const$/{N;d}' \
    -e '/[(][)][(]/d' \
    -e '/\b(True|False)\b/d' \
    -e '/AddContactMaterial/d'

Sample math.py modification: math_pyi_changes.patch.txt

-class RigidTransform_[float]:
+@instantiation(float)
+class RigidTransform_:
     _original_name: ClassVar[str] = ...
     _original_qualname: ClassVar[str] = ...
     _pybind11_del_orig: ClassVar[None] = ...
@@ -410,9 +421,12 @@ class RigidTransform_[float]:
     def IsNearlyIdentity() -> Any: ...
     def SetFromIsometry3(self, pose: pydrake.common.eigen_geometry.Isometry3_[float]) -> None: ...
     def SetIdentity(self) -> RigidTransform_[float]: ...
-    def cast[AutoDiffXd](self, *args, **kwargs) -> Any: ...
-    def cast[Expression](self, *args, **kwargs) -> Any: ...
-    def cast[float](self, *args, **kwargs) -> Any: ...
+    @instantiation(AutoDiffXd)
+    def cast(self, *args, **kwargs) -> Any: ...
+    @instantiation(Expression)
+    def cast(self, *args, **kwargs) -> Any: ...
+    @instantiation(float)
+    def cast(self, *args, **kwargs) -> Any: ...
     def inverse(self) -> RigidTransform_[float]: ...
     def rotation(self, *args, **kwargs) -> Any: ...
     def set(self, R, p: numpy.ndarray[numpy.float64[3,1]]) -> None: ...
@@ -464,7 +478,8 @@ class RollPitchYaw:
     def __getstate__(self) -> numpy.ndarray[numpy.float64[3,1]]: ...
     def __setstate__(self, arg0: numpy.ndarray[numpy.float64[3,1]]) -> None: ...

This enable valid syntax in our .pyi files, but from brief testing against a very simple program there are a number of followups that would need to be investigated to fix these files. Attached here are the results of running mypy against a simple program:
mypy_after.txt

Some of the issues seem fixable, some are a little odd that may need massaging of how stubgen is called (perhaps using the --export-less flag). It's odd that PixelType and pydrake.geometry.render are not visible, it's unclear for the mentions of m whether or not we will be able to get stubgen to produce a numpy.ndarray typehint correctly (seems like a bug upstream, it can infer some as e.g., [4, 4]).

Attaching for (mostly our) convenience a script that does all of this, assumed to be inside a drake repo such that bazel run ... targets work: pyi_install_check.sh.txt

It will install to /tmp/drake, including the stubgen for .pyi, as well as run some results on a very simple python script

#!/usr/bin/env python3

from pydrake.geometry import GeometryInstance, Sphere, Rgba
from pydrake.math import RigidTransform, RigidTransform_

assert __name__ == "__main__"

# snippets ...
# https://github.com/RobotLocomotion/drake/blob/master/bindings/pydrake/test/geometry_common_test.py
# geom: GeometryInstance = GeometryInstance(X_PG=RigidTransform(), shape=Sphere(1.0), name="sphere")
geom: GeometryInstance = GeometryInstance(X_PG=RigidTransform_[float](), shape=Sphere(1.0), name="sphere")

some_color: Rgba = Rgba(0, 0, 1, 1)

# Note that we get RigidTransform_ not RigidTransform (no _)
xform = geom.pose()
numpy_arr = xform.GetAsMatrix4()
numpy_arr.argmax()

So there me be opportunity to continue whittling down the remaining issues and possibly work with upstream on how to avoid our issues altogether. What should be assumed: once we start installing .pyi files, users will expect that they can be able to run mypy across their own code to validate it. While desirable, it may be some time before that is fixed and drake's bindings are actually usable for typechecking.

Just Concerning Editors

  1. To the best of my knowledge, if we install py.typed and associated *.pyi that are broken or incorrect, this should have zero runtime impact on code consuming drake. The only changes would be editors behaving differently and/or they would want to try and run mypy on their code and it would then also break. AKA the .pyi being wrong should not break an install.
  2. I have only really tested Visual Studio Code, which does not use mypy (it uses microsoft's pyright). AFAIU PyCharm uses their own implementation too.

So at least in Visual Studio Code, provided I configure my environment to know where pydrake is installed, I can get some editor assistance when the .pyi files are installed (broken or not).

No Modifications

drake_pyi_no_modifications.mp4

With Modifications

drake_pyi_with_modifications.mp4

All that is needed for consumption via Visual Studio Code is to edit your workspace settings and make sure that wherever pydrake and the stub files was installed is added as an extra path:

{
  "python.analysis.extraPaths": [
    "/tmp/drake_pyi_install/lib/python3.8/site-packages"
  ]
}

@mwoehlke-kitware
Copy link
Contributor

Some further musings on the above.

We use instantiations like Foo[float]. At a Python level, Foo is an object that implements __getitem__ to return an actual class type. Currently, that also uses the invalid spelling Foo[float], or pedantically, vars()['Foo[float]'], although this name isn't really important.

Now, it seems mypy does have some concept of generics, but only true generics. What we have is actual specilization; that is, the API of Foo[float] might be entirely unlike the API of Foo[int]. The short version is that when mypy sees Foo[float], all it knows about this is that Foo is a pydrake.common.cpp_template.TemplateClass, and that it knows nothing about what that type's __getitem__ returns. Thus, if one writes something like x = Foo[float](), the type of x is completely unknown.

As I see it, if type checking is to be at all useful in the face of template types, we need to do one of two things:

  1. Somehow teach stubgen to identify instantiations and turn them into proper generics. This may require additional mangling on the pybind11 side to help out (particularly, so that stubgen doesn't need to invent the template parameter name out of whole cloth). It will require stubgen to somehow deduce a common API. This seems problematic.
  2. Teach mypy to recognize the concept of instantiators and specializations. The substitution to produce @instantiation decorators is a WAG at how such a mechanism might be used.

Such mechanism (2) would need to:

  • Apply some agreed-upon mangling scheme to instantiations (e.g. ClassName[Type]).
  • Register that ClassName is an object which implements a __getitem__ for which a specific TypeName produces a specific mangled name.

Additionally, mypy would need to recognize when such an object is invoked, and evaluate (if possible) the __getitem__ call in order to produce the correct type. Generics imply that mypy has at least partial functionality in this respect.

Conceptually, this is similar to @overload, which warrants that under the hood the actual, single function takes arbitrary arguments and matches them against one of several possible signatures. The difference is that @overloads perform argument-dependent dispatch, while an @instantiation uses a "trampoline" which takes an explicit specification of which concrete entity to use. The important take-away is that mypy already has a notion of being able to define a particular identifier more than once and turning that into under-the-hood "magic".

Note that other tools (pyright, pycharm) presumably have the same problem and would need to support the same mechanism.

@jwnimmer-tri
Copy link
Collaborator Author

We'll need to revisit what to do about "template" instantiations, down the road. For the moment, do not spend anymore time on it. I think I will make it a TRI project instead of a Kitware project.

Please work only on the first milestone -- create a PR that installs the pyi files (and anything else related), so that all of the release channels (apt, tgz, whl) contain the pyi files. We will test that and plan to merge it unless that we find that it will cause users problems in practice.

@jwnimmer-tri
Copy link
Collaborator Author

The pyi files are built and installed now as of #17709. I'm moving this back to the TRI side to iterate on the design space of our template instantiations and how to make it compatible with the type stubs.

@jwnimmer-tri
Copy link
Collaborator Author

Update: I'm going to try to wait for bazel 5.3.1 to appear -- it has some fixes for shared library rpaths that might make this a bit easier.

@jwnimmer-tri
Copy link
Collaborator Author

jwnimmer-tri commented Oct 11, 2022

The nightly builds will incorporate *.pyi again as of tonight.

Note in particular that there are wheel nightly builds available, for early preview:
https://drake.mit.edu/pip.html#nightly-releases

I went back and watched the two videos upthread again.

The first video (no modifications) seems perfectly fine to me. I guess the objection is that when suggesting the types of arguments like GeometryInstance(X_PG=... it suggests the desired type as RigidTransform_[float] instead of the simpler alias RigidTransform.

The suggestion isn't wrong (the more complicated type would work fine) but is not the typical, shorter spelling that the user probably wants. In short, it's correct but imperfect.

I'll hypothesize that in practice, users will be able to overcome this imperfection. They know about the shorter types, and will probably edit the suggestion to use the shorter spelling. Thus, there is nothing left remaining to do in this ticket.

In any case, the call to action was to install pyi files to enable IDE auto-complete. That part seems done.

If we think that certain IDE suggestions are not good enough given the current pyi contents, then we can open more detailed tickets about those acute problems. I'd like to hear what additional improvements actual users request, instead of us trying to speculate. I'll close this issue; we'll open new issues for user-requested follow-ups (or users can open their own requests too!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: distribution Nightly binaries, monthly releases, docker, installation priority: medium
Projects
None yet
Development

No branches or pull requests

4 participants