Quick question about environment normalization #15

MoMe36 · 2018-09-16T15:41:38Z

Hello,

I'm planning to use your PPO implementations, which seem well-written, clear and easy to understand. But first, I'd like to have the answer to the following question:

In OpenAI baselines, environments are passed to various classes, such as VecNormalize or Observation/Reward Wrappers or even Monitor. In these cases, observations and rewards are transformed in order to ease learning. However, there is a lot of encapsulation and it makes it kinda difficult to follow the chain. After a quick glance at your implementations, I'm under the impression that you do transform the observations in unity/utils/running_state.py. Is that so ? Are there other transformations ? Or were you just careful while designing the environment, designing it to make sure rewards were appropriately scaled ?

Thanks a lot for your answers.

The text was updated successfully, but these errors were encountered:

dnddnjs · 2018-09-18T15:56:02Z

Yes it is. There is no other transformation. The reward function of mujoco and unity walker is already designed well so we didn’t think about scaling reward. But it might affect performance so we will look at it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick question about environment normalization #15

Quick question about environment normalization #15

MoMe36 commented Sep 16, 2018

dnddnjs commented Sep 18, 2018

Quick question about environment normalization #15

Quick question about environment normalization #15

Comments

MoMe36 commented Sep 16, 2018

dnddnjs commented Sep 18, 2018