[optimizer] redesign the way we handle optimizer parameters #924

pilkicTT · 2024-12-17T13:43:02Z

Currently, the parameters of the optimizer (i.e. learning_rate, momentum, etc.) are copied and tied for each trainable parameters in the model.

E.g. if the optimizer has a learning rate parameter named lr and the compiled graph has a parameter named l1.weight, the optimizer parameter in the graph will be named input_opt_l1.weight_0.lr and in the optimizer we will store this parameter in a Dict[Tuple[str, str], Tensor]. For the example case, that will be:

{
    ("l1.weight", "lr"): value
}

This means that when we want to prepare the optimizer graph for execution, we need to manually match the name of the optimizer parameter in the graph with its entry in the optimizer's parameter dictionary. E.g. match "input_opt_l1.weight_0.lr" with ("l1.weight", "lr").

Additionally, all trainable parameters will have its own copy of the learning rate optimizer parameter, which is not ideal as well.

The text was updated successfully, but these errors were encountered:

pilkicTT added the tech-debt label Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[optimizer] redesign the way we handle optimizer parameters #924

[optimizer] redesign the way we handle optimizer parameters #924

pilkicTT commented Dec 17, 2024

[optimizer] redesign the way we handle optimizer parameters #924

[optimizer] redesign the way we handle optimizer parameters #924

Comments

pilkicTT commented Dec 17, 2024