-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hydra-zen + tyro ❤️ #23
Comments
Thanks for the note Ryan! I spent some time today experimenting with My main thought is that it'd be nice to have an equivalent of
|
Hi Brent, I am so glad that you find this to be useful!
If I am understanding this correctly, from dataclasses import dataclass
from typing import TypeVar, Type
from typing_extensions import assert_type
from hydra_zen import instantiate
from hydra_zen.typing import Builds
T = TypeVar("T")
class MyModel: ...
@dataclass
class Config:
model: Builds[Type[MyModel]]
out = instantiate(Config.model)
assert_type(out, MyModel) # passes for both mypy and pyright Does this address what you are looking for?
Fortunately, hydra-zen isn't doing anything too magical under the hood beyond generating dataclass types, so I don't suspect there are too many chances for incompatibilities. I will try to find some time later to come up with a list of potential edge cases to test. I think the main thing to think of are opportunities to boost the ergonomics of tyro + hydra-zen. I would like to spend time playing with tyro + hydra-zen in more realistic settings to see if there any places where hydra-zen's auto-config capabilities could be boot-strapped by tyro to streamline the user experience. One thing that might be handy is the zen wrapper that will be included in the upcoming release. This will auto-extract & instantiate fields from a config and pass them to a function based on a function's signature. This might save users from having to manually call instantiate on objects. These are just some initial thoughts. I am happy to discuss/brainstorm further! Footnotes
|
Thanks for the example! I had seen Here's a sketchy prototype of the behavior that would be nice: from dataclasses import dataclass, field
from typing import Type, get_args, get_origin, TYPE_CHECKING
from hydra_zen import builds, instantiate
from hydra_zen.typing import Builds
from typing_extensions import assert_type
import tyro
class MyModel:
def __init__(self, layers: int, units: int = 32) -> None:
"""Initialize model.
Args:
layers: Number of layers.
units: Number of units.
"""
pass
if not TYPE_CHECKING:
# Make Builds[Type[T]] annotations evaluate to builds(T) at runtime.
def monkey(t):
if get_origin(t) is type:
inner_type = get_args(t)[0]
out = builds(inner_type, populate_full_signature=True)
out.__doc__ = inner_type.__init__.__doc__ # This will overwrite the current hydra-zen docstring.
return out
else:
return Builds
Builds.__class_getitem__ = monkey
@dataclass
class Config:
model: Builds[Type[MyModel]] = field(
# builds() currently returns a mutable / non-frozen dataclass.
default_factory=lambda: builds(
MyModel,
populate_full_signature=True,
)(layers=3)
)
config = tyro.cli(Config)
out = instantiate(config.model)
assert_type(out, MyModel) My guess is the options are:
|
Also, I imagine you found tyro via nerfstudio? We actually did discuss a pattern for directly mapping args -> config schemas similar to I ended up having some hesitations about it and the team went with explicit The main thing was losing some specific advantages of the explicit approach, where typically the
And then lastly just a desire to reduce magic, since we have a lot of contributors from research backgrounds who were new to types + Python and there were hesitations about learning barriers introduced by things like protocols or dynamically generated classes. |
(sorry for the delay! Will get back to you asap) |
Ah, I see now how tyro would leverage this sort of enhanced # contents of src/tyro/hydra_zen.py
from typing import TYPE_CHECKING, TypeVar, get_args
from typing_extensions import Protocol
from hydra_zen import builds
from hydra_zen.typing import Builds as _Builds
T = TypeVar("T", covariant=True)
class Builds(_Builds[T], Protocol[T]):
if not TYPE_CHECKING:
def __get_item__(self, key):
inner_type = get_args(key)[0]
out = builds(inner_type, populate_full_signature=True)
out.__doc__ = inner_type.__init__.__doc__ # This will overwrite the current hydra-zen docstring.
return out
class PartialBuilds(_PBuilds[T], Protocol[T]):
if not TYPE_CHECKING:
def __get_item__(self, key):
inner_type = get_args(key)[0]
out = builds(inner_type, populate_full_signature=True, zen_partial=True)
out.__doc__ = inner_type.__init__.__doc__
return out (things aren't quite right with these protocols and their relationships with their parents, but there is definitely a solution that will work, I just need to tinker around a little more). This way tyro users can opt-in to using this feature and you can control the behavior of
Just FYI: you can have
Sure!
It seems like both of these things could be achieved -- with full "explicitness" -- by having each It seems like you are effectively circumventing this by decoupling your inits from your classes via your datclass-based configs. I think it is a reasonable approach. You would indeed incur a whole lot of boilerplate code if you tried to keep everything explicit. hydra-zen is all about encouraging users to be explicit in their library interfaces, to avoid repetition, and to follow the dictum that "frameworks should be kept at arms length" 1 (e.g., configuration frameworks like Hydra). Projects can typically heed this without issue because they don't have too much inheritance going on, or, they aren't providing a bunch of I hope that this was helpful! Footnotes
|
btw I played with import torch as tr
from dataclasses import dataclass
@dataclass
class Module(tr.nn.Module):
def __post_init__(self):
super().__init__()
@dataclass(unsafe_hash=True)
class A(Module):
x: int
def __post_init__(self):
super().__post_init__()
self.layer1 = tr.nn.Linear(self.x, self.x)
@dataclass(unsafe_hash=True)
class B(A):
y: int
def __post_init__(self):
super().__post_init__()
self.layer2 = tr.nn.Linear(self.y, self.y) >>> list(A(1).parameters())
[Parameter containing:
tensor([[-0.0720]], requires_grad=True),
Parameter containing:
tensor([0.3248], requires_grad=True)]
>>> list(B(1, 2).parameters())
[Parameter containing:
tensor([[0.1868]], requires_grad=True),
Parameter containing:
tensor([0.9107], requires_grad=True),
Parameter containing:
tensor([[-0.6322, 0.4550],
[-0.6481, 0.2250]], requires_grad=True),
Parameter containing:
tensor([-0.2255, 0.6488], requires_grad=True)] but there are caveats... this doesn't work if any of your init fields are class Evaluators(nn.Module):
def __init__(self):
super(Evaluators, self).__init__()
self.linear = nn.Linear(1, 1)
@dataclass(unsafe_hash=True)
class Net(nn.Module):
evaluator: Evaluators
def __post_init__(self):
super().__init__()
self.linear = nn.Linear(1, 1)
evaluators = Evaluators()
net = Net(evaluators )
|
Thanks for taking the time! Need some time to think about the rest, but PyTorch3D has some code that attempts something similar: https://pytorch3d.org/tutorials/implicitron_config_system |
Actually... I think there is a solution to the import torch as tr
import torch.nn as nn
from dataclasses import dataclass
@dataclass
class DataclassModule(nn.Module):
def __new__(cls, *args, **k):
inst = super().__new__(cls)
nn.Module.__init__(inst)
return inst
@dataclass(unsafe_hash=True)
class Net(DataclassModule):
other_layer: nn.Module
input_feats: int = 10
output_feats: int = 20
def __post_init__(self):
self.layer = nn.Linear(self.input_feats, self.output_feats)
def forward(self, x):
return self.layer(self.other_layer(x))
net = Net(other_layer=nn.Linear(10, 10))
assert net(tr.tensor([1.]*10)).shape == (20,)
assert len(list(net.parameters())) == 4
@dataclass(unsafe_hash=True)
class A(DataclassModule):
x: int
def __post_init__(self):
self.layer1 = nn.Linear(self.x, self.x)
@dataclass(unsafe_hash=True)
class B(A):
y: int
def __post_init__(self):
super().__post_init__()
self.layer2 = nn.Linear(self.y, self.y)
assert len(list(A(1).parameters())) == 2
assert len(list(B(1, 2).parameters())) == 4 |
Thanks again for the detailed responses! Helpful + thought-provoking. On supporting I'm leaning toward adding some more general infrastructure in Probably this could be broken into a function that checks whether an annotation matches a rule and another function that defines how to instantiate these annotations: # Type[Any] is not quite the correct annotation (Builds[T] is not a real type) but is how `tyro` currently annotates things internally.
def matcher(typ: Type[Any]) -> bool:
"""Returns true when `typ` is a Builds[T] type."""
return get_origin(typ) is hydra_zen.typing.Builds
T = TypeVar("T")
def instantiator(typ: Type[T]) -> Callable[..., T]:
"""Takes a Build[T] protocol type as input, and returns a handler for instantiating the type."""
assert get_origin(typ) is hydra_zen.typing.Builds
(inner,) = get_args(type)
return hydra_zen.builds(inner, populate_full_signature=True)
tyro._registry.register_custom_instantiator(
matcher,
instantiator,
) Does that make sense to you? The API specifics could probably use more thought — potentially it could be used to refactor all of the special handling for things like dataclasses, pydantic, attrs, TypedDict, etc that's currently hardcoded in On the PyTorch module + dataclass notes: agree with everything you wrote totally; I've been use |
Nice! I like the idea of I like this a lot because tyro won't have to ship any hydra-zen specific logic whatsoever! This obviously is ideal because this enables me to make changes to protocols and improvements to the registered instantiators without having to pester you about updating tyro. Naturally, a tyro user should only have to update their hydra-zen version for the latest and greatest support for If you are interested in what this would look like on tyro's end, Hypothesis exposes an entrypoint for its On hydra-zen's end, I would add a file like def matcher(typ: Type[Any]) -> bool:
"""Returns true when `typ` is a Builds[T] type."""
return get_origin(typ) is hydra_zen.typing.Builds
T = TypeVar("T")
def instantiator(typ: Type[T]) -> Callable[..., T]:
"""Takes a Build[T] protocol type as input, and returns a handler for instantiating the type."""
assert get_origin(typ) is hydra_zen.typing.Builds
(inner,) = get_args(type)
return hydra_zen.builds(inner, populate_full_signature=True)
def _tyro_setup_hook():
import tyro
tyro._registry.register_custom_instantiator(
matcher,
instantiator,
) and then hydra-zen's pyproject.toml would add something like
Obviously, any third party could register instantiators in this way, which is awesome1! One last thing that I can foresee, is that it would be nice for tyro to make it easy for 3rd parties to test that their entrypoint hooks are working successfully, in an automated way (i.e., without manually checking the output of a CLI). Perhaps there is already a solution that I am not privy to. Footnotes
|
Great! I've started working on exposing this functionality, but might take a bit because of holidays and such. There are also some design decisions I need to put more time into thinking through related to how flexible the API should be, handling for
Could this be set up the same way as tyro's existing unit tests, via Here's what an existing unit test for pydantic compatibility looks like: import pathlib
import pytest
from pydantic import BaseModel, Field
import tyro
def test_pydantic() -> None:
class ManyTypesA(BaseModel):
i: int
s: str = "hello"
f: float = Field(default_factory=lambda: 3.0)
p: pathlib.Path
# We can directly pass a dataclass to `tyro.cli()`:
assert tyro.cli(
ManyTypesA,
args=[
"--i",
"5",
"--p",
"~",
],
) == ManyTypesA(i=5, s="hello", f=3.0, p=pathlib.Path("~")) |
Yep. That looks great. |
Hello! I just came across
tyro
and it looks great!I wanted to put hydra-zen on your radar. It is a library designed make Hydra-based projects more Pythonic and lighter on boilerplate code. It mainly does this by providing users with functions like builds and just, which dynamically generate dataclasses that describe how to build, via instantiate, various objects. There are plenty of bells and whistles that I could go into (e.g. nice support for partial'd targets), but I'll keep it brief-ish.
That being said, hydra-zen's main features are quite independent of Hydra, and are more focused on generating dataclasses that can configure/build various Python interfaces. It seems like this might be the sort of thing that could be helpful for tyro users who want to generate nested, typed interfaces based on objects in their library or from third party libraries.
This is just a rough idea at this point, but I figured that there might be some potential synergy here! I'd love to get your impressions if you think there might be any value here.
The text was updated successfully, but these errors were encountered: