Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last N actions as mlp_keys encoder input for dreamer_v3 #239

Closed
geranim0 opened this issue Mar 22, 2024 · 10 comments · Fixed by #291
Closed

Last N actions as mlp_keys encoder input for dreamer_v3 #239

geranim0 opened this issue Mar 22, 2024 · 10 comments · Fixed by #291
Assignees
Labels
enhancement New feature or request

Comments

@geranim0
Copy link

geranim0 commented Mar 22, 2024

Hi,

Working on an Atari environment wrapper with action input buffer with len=N that I want to feed as input to mlp_keys.
Algo config:

algo:
  mlp_keys:
    encoder: [actions]

However, unable to get it working, getting error TypeError: object of type 'NoneType' has no len() at

File "/home/sam/dev/ml/sheeprl/sheeprl/utils/env.py", line 171, in <listcomp>
    [k for k in env.observation_space.spaces.keys() if len(env.observation_space[k].shape) in {2, 3}]

Because gym.spaces.Tuple has no member shape.

Wondering what should change in this wrapper so it correctly interfaces with what sheeprl expects? Would there be a way to augment Tuple to have a shape, or should it change to a Box? If needed to be Box, what should be its config?

class InputBufferWtihActionsAsInput_Atari(gym.Wrapper):
    def __init__(self, env: gym.Env, input_buffer_amount: int = 0):
        super().__init__(env)
        if input_buffer_amount <= 0:
            raise ValueError("`amount` should be a positive integer")
        self._input_buffer_amount = input_buffer_amount
        self._input_buf = deque(maxlen=input_buffer_amount)
        self.observation_space = gym.spaces.Dict({
                "rgb": self.env.observation_space,
                "actions": gym.spaces.Tuple([self.env.action_space] * input_buffer_amount)
            })
        
    def get_obs(self, observation):
        return {
            "rgb": observation,
            "actions": self._input_buf
        }

    def reset(self, **kwargs):
        obs, infos = super().reset(**kwargs)
        
        while len(self._input_buf) < self._input_buf.maxlen:
            self._input_buf.append(self.env.action_space.sample())
        
        return self.get_obs(obs), infos
  
    def step(self, action):        
        this_frame_action = self._input_buf[0]
        self._input_buf.append(action)
        
        obs, reward, done, truncated, infos = self.env.step(this_frame_action)

        return self.get_obs(obs), reward, done, truncated, infos

Edit:
I have a working setup using hard-coded, implementation details-aware wrapper using stuff like this. Still wondering how to achieve generic solution though.

        self.observation_space = gym.spaces.Dict({
                "rgb": self.env.observation_space,
                #"last_action": self.env.action_space
                #"actions": gym.spaces.Box(shape=(self.env.action_space.shape, input_buffer_amount), dtype=np.int64)
                #"actions": gym.spaces.Box([self.env.action_space] * input_buffer_amount)
                "actions_0": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
                "actions_1": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
                "actions_2": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
                "actions_3": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
            })

    def get_obs(self, observation: Any) -> Any:
        #observation['past_actions'] = spaces.Space(list(self._input_buf))
        return {
            "rgb": observation,
            #"last_action": self._input_buf[0]
            #"actions": np.array(self._input_buf, dtype=np.int64)
            "actions_0": self._input_buf[0],
            "actions_1": self._input_buf[1],
            "actions_2": self._input_buf[2],
            "actions_3": self._input_buf[3],
        }
@michele-milesi
Copy link
Member

Hi @geranim0,
yes, the observation space must have the shape attribute. I suggest to use the gymnasium.spaces.Box space to augment the observations of the environment.
I prepared a branch with the ActionsAsObservationWrapper that allows you to add last n actions: https://github.com/Eclectic-Sheep/sheeprl/tree/feature/actions-as-obs.
You can specify the number of actions in the env.action_stack parameter. You can also add a dilation between actions (like in the FrameStack), you can set the dilation with the env.action_stack_dilation parameter in the configs.

The key is "action_stack", otherwise it creates conflicts during training (add it to the mlp_keys).

Let me know if it works

Note: Discrete actions are converted into one-hot actions (as the agent works with one-hot actions in the discrete case). We can discuss which is the best option.

cc @belerico

@geranim0
Copy link
Author

Hi @michele,

Thanks for the branch! Taking a look and doing some tests with it.

@geranim0
Copy link
Author

geranim0 commented Mar 27, 2024

So, did some testing, here are the results

image

Where the gray line represents the agent trained with the last N (in this case, 12) actions added to the observations, and the blue line represents the agent trained with the same input buffer (12), without the input buffer as observation. Only 1 run was made for each, but it looks like in the presence of a large input buffer, adding the input buffer as observations is helpful.

It also suggests that the wrapper works 👍

Only modification I made to your branch was add an input buffer to the wrapper.

@michele-milesi
Copy link
Member

Great, I'm glad it works.
I do not understand why you added the input buffer and how you used it. Can you show me which modification you made?
Thanks

@geranim0
Copy link
Author

Sure, actually it is in my first message, in the step function. Instead of using this frame action, I use the one ready for use in the buffer with this_frame_action = self._input_buf[0].

The purpose of this is to simulate human reaction time. That's why I wanted to test adding the input buffer to the observation, to see if it would improve performance (looks like it does).

@michele-milesi
Copy link
Member

Understood, thanks

@belerico
Copy link
Member

Hi @geranim0, if this is done we can add this feature in a new PR and put it in the next release

@belerico belerico added the enhancement New feature or request label Apr 29, 2024
@geranim0
Copy link
Author

geranim0 commented Apr 29, 2024

Hi @belerico, sure!

Side note though, in tests using Discrete action space, things worked fine, but encountered some problems with the action shape not being handled with MultiDiscrete envs for the action-as-obs wrapper and also dreamer_v3.py::main() with this portion

  real_actions = (
      torch.cat([real_act.argmax(dim=-1) for real_act in real_actions], dim=-1).cpu().numpy()
  )
  step_data["actions"] = actions.reshape((1, cfg.env.num_envs, -1))

For now got around it by reshaping my action space to Discrete. Kind of using an old branch, will re-test when updating.

@michele-milesi
Copy link
Member

Hi @geranim0,
can you share the error you encountered and which environment you are using?
Thanks

@michele-milesi
Copy link
Member

I should have fixed the problem, could you check with the multidiscrete action space?
Thanks

@michele-milesi michele-milesi linked a pull request May 20, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants