You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currnetly, continuous PPO ratio is calculated as ratios = new_actor_log_probs/(actor_log_probs+EPSILON) this is derived from previous PPO algorithm ratio = actor_probs/(old_actor_probs+EPSILON) and that made sense as they were not log probabilites.
The line should actually be something like ratios = exp(new_actor_log_probs - actor_log_probs)
The text was updated successfully, but these errors were encountered:
Currnetly, continuous PPO ratio is calculated as
ratios = new_actor_log_probs/(actor_log_probs+EPSILON)
this is derived from previous PPO algorithmratio = actor_probs/(old_actor_probs+EPSILON)
and that made sense as they were not log probabilites.The line should actually be something like
ratios = exp(new_actor_log_probs - actor_log_probs)
The text was updated successfully, but these errors were encountered: