Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change override to override_sole_init in alf.config #1705

Merged
merged 3 commits into from
Sep 25, 2024

Conversation

QuantuMope
Copy link
Contributor

@QuantuMope QuantuMope commented Sep 20, 2024

This PR changes the override flag of alf.config to override_sole_init. The implementation before allowed alf.override_config to override a value of a config regardless of the mutable flag. Unfortunately, this breaks the logic of setting parameters via --conf_params or alf.pre_config. It also causes issues in evaluator processes that may set certain config values with mutable=False and then read from the conf file as it overwrites the desired eval setting.

Therefore, this PR now changes alf.override_config to alf.override_sole_config, which respects immutability. On our local hobot code, I've now verified that all the above problems have been remedied and singular config calls can now be enforced when using ALF_SOLE_CONFIG=1, greatly reducing chance for bugs via nested / unintentional overwrites.

Additional test cases for edge cases has also been provided in config_utils_test.py.

Once this gets merged, I can submit PRs for ALF_SOLE_CONFIG=1 compatible hobot confs.

@QuantuMope QuantuMope force-pushed the PR/andrew/fix-sole-init branch from cbb6a84 to 7fe31a7 Compare September 20, 2024 00:42
Copy link
Contributor

@le-horizon le-horizon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the much cleaner changes and tests, Andrew. A few questions regarding alf config behavior.

Le

@@ -182,6 +182,63 @@ def sole_init_test_env(x):
self.assertEqual(alf.get_config_value("sole_init_test_twice.x"), 1)
self.assertEqual(alf.get_config_value("sole_init_test_env.x"), 1)

# Test override_config doesn't doesn't overwrite for immutable values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove one "doesn't"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

pass

alf.config("override_on_immutable", x=0, mutable=False)
alf.override_config("override_on_immutable", x=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to raise an error on overriding immutable config? Silently ignoring override can be hard to debug?

Copy link
Contributor Author

@QuantuMope QuantuMope Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to my reply above for why this cannot be done currently without adding more state parameters. There is a logger warning that gets printed in this scenario, though it's easy for users to not see it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's color code the warning message a bright yellow, so it's harder to miss?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it prints as red via absl.logging.warning. Do you think yellow is better?

alf/config_util_test.py Outdated Show resolved Hide resolved
pass

alf.config("override_no_affect_sole_init", x=1, sole_init=True)
alf.override_config("override_no_affect_sole_init", x=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also throw error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't throw an error since override_config allows us to set a config's value despite an earlier sole_init=True call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Maybe rename the function to alf.override_sole_config()? Just as how you renamed override to override_sole_init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

alf/config_util.py Show resolved Hide resolved
only when absolutely necessary (e.g., a teacher-student training loop,
where the student must override certain configs inherited from the
teacher). If the config is immutable, a warning will be declared with
no changes made.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of quietly ignoring the config, should we just throw error for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason we cannot do this is because alf.pre_config and --conf_param set values with mutable=False (that is how those values avoid getting overwritten). Therefore, there may be times we want to override our conf files using one of the two options and raising an error in this situation is problematic. A workaround for this was to possibly introduce a pre_configged state variable for configs, but I thought that may be unnecessary. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are right. Let's go with current way for the PR.

Separately, are there some examples of these cases? It would be good to understand a bit more, and see if we can design better solutions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to examples of the teacher-student setup?

One is our own agent training. Let's say you train expert with ALF_SOLE_CONFIG=1. When we train the agent with ALF_SOLE_CONFIG=1 as well, it will inevitably raise errors as agent often overwrites some of the expert's configs. Here, we can use the alf.override_config function. Not only does this minimize config calls, but it also explicitly tells the coder which inherited calls are being overwritten.

Copy link
Contributor Author

@QuantuMope QuantuMope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the really speedy review Le. Addressed your comments. Let me know if there are any questions or clarifications I can make.

alf/config_util.py Show resolved Hide resolved
only when absolutely necessary (e.g., a teacher-student training loop,
where the student must override certain configs inherited from the
teacher). If the config is immutable, a warning will be declared with
no changes made.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason we cannot do this is because alf.pre_config and --conf_param set values with mutable=False (that is how those values avoid getting overwritten). Therefore, there may be times we want to override our conf files using one of the two options and raising an error in this situation is problematic. A workaround for this was to possibly introduce a pre_configged state variable for configs, but I thought that may be unnecessary. Let me know what you think.

@@ -182,6 +182,63 @@ def sole_init_test_env(x):
self.assertEqual(alf.get_config_value("sole_init_test_twice.x"), 1)
self.assertEqual(alf.get_config_value("sole_init_test_env.x"), 1)

# Test override_config doesn't doesn't overwrite for immutable values.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

pass

alf.config("override_on_immutable", x=0, mutable=False)
alf.override_config("override_on_immutable", x=1)
Copy link
Contributor Author

@QuantuMope QuantuMope Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to my reply above for why this cannot be done currently without adding more state parameters. There is a logger warning that gets printed in this scenario, though it's easy for users to not see it.

alf/config_util_test.py Outdated Show resolved Hide resolved
pass

alf.config("override_no_affect_sole_init", x=1, sole_init=True)
alf.override_config("override_no_affect_sole_init", x=2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't throw an error since override_config allows us to set a config's value despite an earlier sole_init=True call.

Copy link
Contributor

@le-horizon le-horizon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering the questions. Only a few nits.

Thanks,
Le

only when absolutely necessary (e.g., a teacher-student training loop,
where the student must override certain configs inherited from the
teacher). If the config is immutable, a warning will be declared with
no changes made.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are right. Let's go with current way for the PR.

Separately, are there some examples of these cases? It would be good to understand a bit more, and see if we can design better solutions.

pass

alf.config("override_on_immutable", x=0, mutable=False)
alf.override_config("override_on_immutable", x=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's color code the warning message a bright yellow, so it's harder to miss?

alf/config_util_test.py Outdated Show resolved Hide resolved
alf.config("override_no_affect_sole_init", x=1, sole_init=True)
alf.override_config("override_no_affect_sole_init", x=2)
with self.assertRaises(RuntimeError) as context:
alf.config("override_no_affect_sole_init", x=3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test that x's value is 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

pass

alf.config("override_no_affect_sole_init", x=1, sole_init=True)
alf.override_config("override_no_affect_sole_init", x=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Maybe rename the function to alf.override_sole_config()? Just as how you renamed override to override_sole_init.

@@ -285,7 +285,11 @@ def get_env():
# A 'None' random seed won't set a deterministic torch behavior.
for _ in range(PerProcessContext().ddp_rank):
random_seed = random.randint(0, 2**32)
config1("TrainerConfig.random_seed", random_seed, raise_if_used=False)
config1(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are several other config1 calls in config_helpers.py that may need fix too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. This has been fixed for the other calls.

Copy link
Contributor Author

@QuantuMope QuantuMope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. PTAL

@@ -285,7 +285,11 @@ def get_env():
# A 'None' random seed won't set a deterministic torch behavior.
for _ in range(PerProcessContext().ddp_rank):
random_seed = random.randint(0, 2**32)
config1("TrainerConfig.random_seed", random_seed, raise_if_used=False)
config1(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. This has been fixed for the other calls.

only when absolutely necessary (e.g., a teacher-student training loop,
where the student must override certain configs inherited from the
teacher). If the config is immutable, a warning will be declared with
no changes made.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to examples of the teacher-student setup?

One is our own agent training. Let's say you train expert with ALF_SOLE_CONFIG=1. When we train the agent with ALF_SOLE_CONFIG=1 as well, it will inevitably raise errors as agent often overwrites some of the expert's configs. Here, we can use the alf.override_config function. Not only does this minimize config calls, but it also explicitly tells the coder which inherited calls are being overwritten.

pass

alf.config("override_on_immutable", x=0, mutable=False)
alf.override_config("override_on_immutable", x=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it prints as red via absl.logging.warning. Do you think yellow is better?

pass

alf.config("override_no_affect_sole_init", x=1, sole_init=True)
alf.override_config("override_no_affect_sole_init", x=2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

alf.config("override_no_affect_sole_init", x=1, sole_init=True)
alf.override_config("override_no_affect_sole_init", x=2)
with self.assertRaises(RuntimeError) as context:
alf.config("override_no_affect_sole_init", x=3)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

@QuantuMope QuantuMope force-pushed the PR/andrew/fix-sole-init branch from edb31e1 to 0da269c Compare September 24, 2024 18:09
@QuantuMope QuantuMope merged commit 3487f59 into pytorch Sep 25, 2024
2 checks passed
@QuantuMope QuantuMope deleted the PR/andrew/fix-sole-init branch September 25, 2024 23:31
QuantuMope added a commit that referenced this pull request Oct 2, 2024
* Rename alf.override_config to alf.override_sole_config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants