ReplayBuffer storing actions size mismatch during env reset #278

defrag-bambino · 2024-05-03T08:59:09Z

Hi,

I am trying to write a simple gym wrapper for an existing env.
During testing, I am not facing the following issue:

  File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 647, in main
    rb.add(reset_data, dones_idxes, validate_args=cfg.buffer.validate_args)
  File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/data/buffers.py", line 656, in add
    self._buf[env_idx].add(env_data, validate_args=validate_args)
  File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/data/buffers.py", line 220, in add
    self.buffer[k][idxes] = data_to_store[k]
  File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/utils/memmap.py", line 264, in __setitem__
    self.array[idx] = value
ValueError: shape mismatch: value array of shape (1,1,5) could not be broadcast to indexing result of shape (1,1,4)

Which, I think, originates from this line: reset_data["actions"] = np.zeros((1, reset_envs, np.sum(actions_dim))) (line 643 in dreamer_v3.py). My env has action_space.shape of (1,4) - but in this line it is summing up to 1+4=5.

Is this the desired behavior?

Thanks

The text was updated successfully, but these errors were encountered:

michele-milesi · 2024-05-03T14:18:40Z

Hi @defrag-bambino,
thank you for reporting this problem.

Which action space are you using? Are they continuous actions?
In this case, we assume that continuous actions have a shape with a dimension, something like this: (n,). This allows us to handle continuous, discrete, and multidiscrete in the same way.
I would suggest you try changing the action space to dimension (4,).

@belerico might it make sense to have a wrapper that flattens the continuous actions?

defrag-bambino · 2024-05-03T15:31:41Z

Yes, it is a continuous "Box" Space.
The problem is that this particular action_space is (N_AGENTS, 4). So there is different versions of the gym env with different action_space shapes).

defrag-bambino · 2024-05-03T15:50:11Z

I've tried to work around it using np.squeeze() and np.expand_dims() in relevant places of my env wrapper. This seems to work for now.
However, after a few seconds it crashes with this error

Stacktrace

Traceback (most recent call last):
File "/home/drt/miniconda3/envs/sheeprl/bin/sheeprl", line 8, in
sys.exit(run())
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 222, in run_and_report
raise ex
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 219, in run_and_report
return func()
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 352, in run
run_algorithm(cfg)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 190, in run_algorithm
fabric.launch(reproducible(command), cfg, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 839, in launch
return self._wrap_and_launch(function, self, *args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 924, in _wrap_and_launch
return launcher.launch(to_run, *args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/strategies/launchers/subprocess_script.py", line 104, in launch
return function(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 930, in _wrap_with_setup
return to_run(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 186, in wrapper
return func(fabric, cfg, *args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 677, in main
train(
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 113, in train
embedded_obs = world_model.encoder(batch_obs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 119, in forward
output = self._forward_module(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 469, in forward
mlp_out = self.mlp_encoder(obs, *args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/agent.py", line 151, in forward
return self.model(x)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 119, in forward
return self.model(obs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x72 and 1x512)

Seems like the same holds for the observation shape (1, 72).

belerico · 2024-05-03T19:02:36Z

Hi @defrag-bambino, thank you for reporting this problem.

Which action space are you using? Are they continuous actions? In this case, we assume that continuous actions have a shape with a dimension, something like this: (n,). This allows us to handle continuous, discrete, and multidiscrete in the same way. I would suggest you try changing the action space to dimension (4,).

@belerico might it make sense to have a wrapper that flattens the continuous actions?

Yep, we can add it and leave it to the user to use it

belerico · 2024-05-03T19:05:47Z

I've tried to work around it using np.squeeze() and np.expand_dims() in relevant places of my env wrapper. This seems to work for now. However, after a few seconds it crashes with this error

Stacktrace
Traceback (most recent call last): File "/home/drt/miniconda3/envs/sheeprl/bin/sheeprl", line 8, in sys.exit(run()) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main _run_hydra( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 222, in run_and_report raise ex File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 219, in run_and_report return func() File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 352, in run run_algorithm(cfg) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 190, in run_algorithm fabric.launch(reproducible(command), cfg, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 839, in launch return self._wrap_and_launch(function, self, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 924, in _wrap_and_launch return launcher.launch(to_run, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/strategies/launchers/subprocess_script.py", line 104, in launch return function(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 930, in _wrap_with_setup return to_run(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/cli.py", line 186, in wrapper return func(fabric, cfg, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 677, in main train( File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 113, in train embedded_obs = world_model.encoder(batch_obs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 119, in forward output = self._forward_module(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 469, in forward mlp_out = self.mlp_encoder(obs, *args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/algos/dreamer_v3/agent.py", line 151, in forward return self.model(x) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/sheeprl/models/models.py", line 119, in forward return self.model(obs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/drt/miniconda3/envs/sheeprl/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x72 and 1x512)

Seems like the same holds for the observation shape (1, 72).

If your observation space is a 1D vector, then you should also remove the leadning 1 in the dimension i suppose. Can you try it?

belerico · 2024-05-06T08:28:09Z

Hi @defrag-bambino, we're sorry but right now Multi-Agent RL (MARL) is not supported, so your actions and observations space must be unrelated from the number of agents, which are considered as independentfrom one another. This means that:

Observations must be 1D vectors or 2D/3D images: everything that is not a 1D vector will be processed by a CNN by the agent. A 2D image or a 3D image of shape [H,W,1] or [1,H,W] will be considered as a grayscale image, a multi-channel image otherwise.
An action of type gymnasium.spaces.Box must be of shape (n,), where n is the number of (possibly continuous) actions the environment supports.
Every agent runs in its own environment

belerico · 2024-05-06T08:45:23Z

Maybe there could be a solution as explained in #241

belerico closed this as completed May 6, 2024

belerico added the wontfix This will not be worked on label May 6, 2024

belerico mentioned this issue May 6, 2024

enabling self play #241

Open

belerico reopened this May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReplayBuffer storing actions size mismatch during env reset #278

ReplayBuffer storing actions size mismatch during env reset #278

defrag-bambino commented May 3, 2024

michele-milesi commented May 3, 2024

defrag-bambino commented May 3, 2024

defrag-bambino commented May 3, 2024

belerico commented May 3, 2024

belerico commented May 3, 2024

belerico commented May 6, 2024

belerico commented May 6, 2024

ReplayBuffer storing actions size mismatch during env reset #278

ReplayBuffer storing actions size mismatch during env reset #278

Comments

defrag-bambino commented May 3, 2024

michele-milesi commented May 3, 2024

defrag-bambino commented May 3, 2024

defrag-bambino commented May 3, 2024

belerico commented May 3, 2024

belerico commented May 3, 2024

belerico commented May 6, 2024

belerico commented May 6, 2024