Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated train_on_episode_end #320

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

LucaVendruscolo
Copy link

Hello, I had a few issues with git so I deleted my old fork #319 and started again. I made all the requested changes and did some general testing on all the algorithms to confirm it worked with them too. Let me know if there's anything else you'd like me to add/change.

Thanks so much for the help!

@belerico
Copy link
Member

belerico commented Oct 5, 2024

Hi @LucaVendruscolo, do you have some empirical evidence that everything is working? Some plots for example

@LucaVendruscolo
Copy link
Author

I will run the tests and take some screenshots now

@LucaVendruscolo
Copy link
Author

The log files: logs.zip
image
grey line is train_on_episode_end: false and blue line is train_on_episode_end: true
running:
python sheeprl.py exp=dreamer_v3 env=gym env.id=CartPole-v1 env.num_envs=4

Copy link
Member

@michele-milesi michele-milesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final comment:
what happend in the case you are using 4 environments and only 1 ends?
As implemented, the agent start the training because the environment ended (reset_envs > 0), but the other environments did not finish their episode.
This leads will decrease the step rate of the non-terminated enviroments.

aggregator.update("Params/exploration_amount", actor._get_expl_amount(policy_step))
is_distributed = fabric.world_size > 1
if (
cfg.algo.train_on_episode_end and reset_envs > 0 and not is_distributed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LucaVendruscolo, here I see a problem: if you set cfg.algo.train_on_episode_end = True and you start a distributed training, then you will hav e the following situation:

  • cfg.algo.train_on_episode_end = True
  • reset_envs > 0 = True (let us suppose that the episode ended)
  • not is_distribured = False
  • not cfg.algo.train_on_episode_end = False

This becomes: (True and True and False) or False = False

In this case, the agent will never enter in the if statement, so the agent will never be trained.
What is missing is the modification of the config cfg.algo.train_on_episode_end when is_distributed is True.
For example, by adding near row 385 something like this:

if fabric.world_size > 1:
    cfg.algo.train_on_episode_end = False

Or you need to modify the condition in order to take into account the situation described above

train_step += world_size
is_distributed = fabric.world_size > 1
if (
cfg.algo.train_on_episode_end and reset_envs > 0 and not is_distributed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for all the files you modified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants