-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated train_on_episode_end #320
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Hi @LucaVendruscolo, do you have some empirical evidence that everything is working? Some plots for example |
I will run the tests and take some screenshots now |
The log files: logs.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final comment:
what happend in the case you are using 4 environments and only 1 ends?
As implemented, the agent start the training because the environment ended (reset_envs > 0
), but the other environments did not finish their episode.
This leads will decrease the step rate of the non-terminated enviroments.
aggregator.update("Params/exploration_amount", actor._get_expl_amount(policy_step)) | ||
is_distributed = fabric.world_size > 1 | ||
if ( | ||
cfg.algo.train_on_episode_end and reset_envs > 0 and not is_distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @LucaVendruscolo, here I see a problem: if you set cfg.algo.train_on_episode_end = True
and you start a distributed training, then you will hav e the following situation:
cfg.algo.train_on_episode_end = True
reset_envs > 0 = True
(let us suppose that the episode ended)not is_distribured = False
not cfg.algo.train_on_episode_end = False
This becomes: (True and True and False) or False = False
In this case, the agent will never enter in the if statement, so the agent will never be trained.
What is missing is the modification of the config cfg.algo.train_on_episode_end
when is_distributed
is True
.
For example, by adding near row 385 something like this:
if fabric.world_size > 1:
cfg.algo.train_on_episode_end = False
Or you need to modify the condition in order to take into account the situation described above
train_step += world_size | ||
is_distributed = fabric.world_size > 1 | ||
if ( | ||
cfg.algo.train_on_episode_end and reset_envs > 0 and not is_distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for all the files you modified
Hello, I had a few issues with git so I deleted my old fork #319 and started again. I made all the requested changes and did some general testing on all the algorithms to confirm it worked with them too. Let me know if there's anything else you'd like me to add/change.
Thanks so much for the help!