-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] Hyperparameters for Roboschool HumanoidFlagrunHarder #26
Comments
You should do an evaluation on a test env after training, using
What the magnitude of the reward for PPO2?
Well, I would suggest you to run hyperparameter tuning (as it is now included in the rl zoo), usually random sampling + median pruner works quite well (given enough budget, i usually use a budget of 1000 trials, use more if you can). I agree that would be nice to have hyperparams for roboschool here (feel free to do a PR if you find good ones ;)). It is not the priority for now (focusing on improving stable-baselines) but will certainly do that in the future. |
@araffin , |
Usually, you don't run hyperparameter tuning on the full budget. You can try on one quarter of it, and because of the pruner, each trial won't use the max budget. |
I have never tried that. Hopefully, I could improve performance. Thank you. |
Hi @araffin , |
The predict method is only used for testing, for training, all policy are stochastics (don't forget to add noise for DDPG). And yes, the default value of
I meant you that you can look at the "HER-support" branch of this repo (which has a better support of mpi), not using HER.
|
Thank you. I used Gaussian Noise for DDPG. I think I don't have any problem with the policy settings for training/ testing. |
Anyway, I get the error of importing HER even I am using the docker image built from docker/Dockerfile.cpu. How could I fix it? I have no mean of using HER but the file utils.py import it from stable_baselines. Thanks. |
HER is only in the master branch for now. It will be released soon (that id why the docker does not work yet), so you need to install SB from source. |
I have not looked at the change but is there any significant difference between master branch and HER-support branch about using MPI for DDPG (or SAC)? Or I need to install it from source? |
Hi @araffin , |
Hi,
Currently, I have used 4 algorithms from stable-baselines for the task of Roboschool HumanoidFlagrunHarder. My evaluation metric is the mean reward of 100 episodes. Basically: PPO2 is perfect, A2C gets the mean reward of 500, DDPG gets the mean reward around 0. SAC gets the mean reward of 280. I have been looking for the hyperparameters setting in stable-baselines-zoo for A2C, DDPG, and SAC but could only find Bullet Env Humanoid for SAC (quite close to Roboschool HFH). Thus, do you have any suggestions for A2C, DDPG, SAC on this task? The number of timesteps for on-policy methods is 400M and 20M for off-policy methods. It would be nice if you add them to the set of hyperparameters.
Thanks.
The text was updated successfully, but these errors were encountered: