Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick question about environment normalization #15

Open
MoMe36 opened this issue Sep 16, 2018 · 1 comment
Open

Quick question about environment normalization #15

MoMe36 opened this issue Sep 16, 2018 · 1 comment

Comments

@MoMe36
Copy link

MoMe36 commented Sep 16, 2018

Hello,

I'm planning to use your PPO implementations, which seem well-written, clear and easy to understand. But first, I'd like to have the answer to the following question:

In OpenAI baselines, environments are passed to various classes, such as VecNormalize or Observation/Reward Wrappers or even Monitor. In these cases, observations and rewards are transformed in order to ease learning. However, there is a lot of encapsulation and it makes it kinda difficult to follow the chain. After a quick glance at your implementations, I'm under the impression that you do transform the observations in unity/utils/running_state.py. Is that so ? Are there other transformations ? Or were you just careful while designing the environment, designing it to make sure rewards were appropriately scaled ?

Thanks a lot for your answers.

@dnddnjs
Copy link
Contributor

dnddnjs commented Sep 18, 2018

Yes it is. There is no other transformation. The reward function of mujoco and unity walker is already designed well so we didn’t think about scaling reward. But it might affect performance so we will look at it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants