-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lower bounded target q #1330
base: pytorch
Are you sure you want to change the base?
lower bounded target q #1330
Conversation
alf/utils/value_ops.py
Outdated
@@ -118,6 +118,72 @@ def action_importance_ratio(action_distribution, | |||
return importance_ratio, importance_ratio_clipped | |||
|
|||
|
|||
def generalized_advantage_estimation(rewards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change generalized_advantage_estimation
, only moved it closer to action_importance_ratio, git messed up the diff..
improve_w_goal_return: Use return calculated from the distance to hindsight | ||
goals. Only supports batch_length == 2, one step td. | ||
improve_w_nstep_bootstrap: Look ahead 2 to n steps, and take the largest | ||
bootstrapped return to lower bound the value target of the 1st step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add formula
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…oal distance return, and n-step bootstrapped return)
…ff-policy) algorithms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emailweixu and @hnyu , I've separated out the HER related logic (the part that move batch_info fields into alg_info and loss calculation) into a new file her_algorithms.
Let me know how you like this version.
Thanks,
Le
improve_w_goal_return: Use return calculated from the distance to hindsight | ||
goals. Only supports batch_length == 2, one step td. | ||
improve_w_nstep_bootstrap: Look ahead 2 to n steps, and take the largest | ||
bootstrapped return to lower bound the value target of the 1st step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
minimum change for lower bounded value target (for episodic return, goal distance return, and n-step bootstrapped return)