Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[optimizer] redesign the way we handle optimizer parameters #924

Open
pilkicTT opened this issue Dec 17, 2024 · 0 comments
Open

[optimizer] redesign the way we handle optimizer parameters #924

pilkicTT opened this issue Dec 17, 2024 · 0 comments

Comments

@pilkicTT
Copy link
Contributor

Currently, the parameters of the optimizer (i.e. learning_rate, momentum, etc.) are copied and tied for each trainable parameters in the model.

E.g. if the optimizer has a learning rate parameter named lr and the compiled graph has a parameter named l1.weight, the optimizer parameter in the graph will be named input_opt_l1.weight_0.lr and in the optimizer we will store this parameter in a Dict[Tuple[str, str], Tensor]. For the example case, that will be:

{
    ("l1.weight", "lr"): value
}

This means that when we want to prepare the optimizer graph for execution, we need to manually match the name of the optimizer parameter in the graph with its entry in the optimizer's parameter dictionary. E.g. match "input_opt_l1.weight_0.lr" with ("l1.weight", "lr").

Additionally, all trainable parameters will have its own copy of the learning rate optimizer parameter, which is not ideal as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant