Updated train_on_episode_end #320

LucaVendruscolo · 2024-10-04T19:24:42Z

Hello, I had a few issues with git so I deleted my old fork #319 and started again. I made all the requested changes and did some general testing on all the algorithms to confirm it worked with them too. Let me know if there's anything else you'd like me to add/change.

Thanks so much for the help!

for more information, see https://pre-commit.ci

belerico · 2024-10-05T21:05:41Z

Hi @LucaVendruscolo, do you have some empirical evidence that everything is working? Some plots for example

LucaVendruscolo · 2024-10-05T21:08:02Z

I will run the tests and take some screenshots now

LucaVendruscolo · 2024-10-05T22:58:36Z

The log files: logs.zip

grey line is train_on_episode_end: false and blue line is train_on_episode_end: true
running:
python sheeprl.py exp=dreamer_v3 env=gym env.id=CartPole-v1 env.num_envs=4

michele-milesi

Final comment:
what happend in the case you are using 4 environments and only 1 ends?
As implemented, the agent start the training because the environment ended (reset_envs > 0), but the other environments did not finish their episode.
This leads will decrease the step rate of the non-terminated enviroments.

michele-milesi · 2024-12-23T11:43:15Z

sheeprl/algos/dreamer_v1/dreamer_v1.py

-                    aggregator.update("Params/exploration_amount", actor._get_expl_amount(policy_step))
+            is_distributed = fabric.world_size > 1
+            if (
+                cfg.algo.train_on_episode_end and reset_envs > 0 and not is_distributed


Hi @LucaVendruscolo, here I see a problem: if you set cfg.algo.train_on_episode_end = True and you start a distributed training, then you will hav e the following situation:

cfg.algo.train_on_episode_end = True

reset_envs > 0 = True (let us suppose that the episode ended)

not is_distribured = False

not cfg.algo.train_on_episode_end = False

This becomes: (True and True and False) or False = False

In this case, the agent will never enter in the if statement, so the agent will never be trained.
What is missing is the modification of the config cfg.algo.train_on_episode_end when is_distributed is True.
For example, by adding near row 385 something like this:

if fabric.world_size > 1: cfg.algo.train_on_episode_end = False

Or you need to modify the condition in order to take into account the situation described above

michele-milesi · 2024-12-23T11:43:32Z

sheeprl/algos/dreamer_v2/dreamer_v2.py

-                    train_step += world_size
+            is_distributed = fabric.world_size > 1
+            if (
+                cfg.algo.train_on_episode_end and reset_envs > 0 and not is_distributed


Same for all the files you modified

Updated train_on_episode_end feature across algorithms and configs

b0eb187

LucaVendruscolo requested review from belerico, DavideTr8, michele-milesi and rcmalli as code owners October 4, 2024 19:24

[pre-commit.ci] auto fixes from pre-commit.com hooks

9ffb397

for more information, see https://pre-commit.ci

michele-milesi reviewed Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated train_on_episode_end #320

Updated train_on_episode_end #320

LucaVendruscolo commented Oct 4, 2024

belerico commented Oct 5, 2024

LucaVendruscolo commented Oct 5, 2024

LucaVendruscolo commented Oct 5, 2024

michele-milesi left a comment

michele-milesi Dec 23, 2024

michele-milesi Dec 23, 2024

Updated train_on_episode_end #320

Are you sure you want to change the base?

Updated train_on_episode_end #320

Conversation

LucaVendruscolo commented Oct 4, 2024

belerico commented Oct 5, 2024

LucaVendruscolo commented Oct 5, 2024

LucaVendruscolo commented Oct 5, 2024

michele-milesi left a comment

Choose a reason for hiding this comment

michele-milesi Dec 23, 2024

Choose a reason for hiding this comment

michele-milesi Dec 23, 2024

Choose a reason for hiding this comment