lower bounded target q #1330

le-horizon · 2022-05-17T16:03:21Z

minimum change for lower bounded value target (for episodic return, goal distance return, and n-step bootstrapped return)

le-horizon · 2022-05-17T16:05:08Z

alf/utils/value_ops.py

@@ -118,6 +118,72 @@ def action_importance_ratio(action_distribution,
    return importance_ratio, importance_ratio_clipped


+def generalized_advantage_estimation(rewards,


I didn't change generalized_advantage_estimation, only moved it closer to action_importance_ratio, git messed up the diff..

le-horizon · 2022-06-14T21:12:57Z

alf/algorithms/td_loss.py

+            improve_w_goal_return: Use return calculated from the distance to hindsight
+                goals.  Only supports batch_length == 2, one step td.
+            improve_w_nstep_bootstrap: Look ahead 2 to n steps, and take the largest
+                bootstrapped return to lower bound the value target of the 1st step.


add formula

…oal distance return, and n-step bootstrapped return)

…ff-policy) algorithms

…ploding.

le-horizon

@emailweixu and @hnyu , I've separated out the HER related logic (the part that move batch_info fields into alg_info and loss calculation) into a new file her_algorithms.

Let me know how you like this version.

Thanks,
Le

le-horizon · 2022-07-01T02:12:05Z

alf/algorithms/td_loss.py

+            improve_w_goal_return: Use return calculated from the distance to hindsight
+                goals.  Only supports batch_length == 2, one step td.
+            improve_w_nstep_bootstrap: Look ahead 2 to n steps, and take the largest
+                bootstrapped return to lower bound the value target of the 1st step.


le-horizon requested review from emailweixu and hnyu May 17, 2022 16:03

le-horizon commented May 17, 2022

View reviewed changes

le-horizon mentioned this pull request May 17, 2022

Lbtq main #1311

Closed

le-horizon commented Jun 14, 2022

View reviewed changes

Le Horizon added 2 commits June 27, 2022 10:57

minimum change for lower bounded value target (for episodic return, g…

b9c4143

…oal distance return, and n-step bootstrapped return)

introduce her_algorithms wrapper to replace ad hoc changes to base (o…

6d5b225

…ff-policy) algorithms

le-horizon force-pushed the lbtq-1 branch from 36e1d53 to 6d5b225 Compare June 30, 2022 18:12

Le Horizon added 4 commits June 30, 2022 12:59

protect for the case where lbtq is enabled but HER is not.

3d3df47

move functions around in value_ops to avoid spurious diffs

ee39185

use LowerBoundedTDLoss with td_lambda=0 (OneStepLoss) to fix value ex…

928e894

…ploding.

field name improvement and add formula in comment

99aacf1

le-horizon commented Jul 1, 2022

View reviewed changes

Le Horizon added 2 commits July 1, 2022 09:37

upload plots and update README

125bb75

update dqn plot

1824056

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lower bounded target q #1330

lower bounded target q #1330

le-horizon commented May 17, 2022

le-horizon May 17, 2022

le-horizon Jun 14, 2022

le-horizon Jul 1, 2022

le-horizon left a comment

le-horizon Jul 1, 2022

		@@ -118,6 +118,72 @@ def action_importance_ratio(action_distribution,
		return importance_ratio, importance_ratio_clipped


		def generalized_advantage_estimation(rewards,

lower bounded target q #1330

Are you sure you want to change the base?

lower bounded target q #1330

Conversation

le-horizon commented May 17, 2022

le-horizon May 17, 2022

Choose a reason for hiding this comment

le-horizon Jun 14, 2022

Choose a reason for hiding this comment

le-horizon Jul 1, 2022

Choose a reason for hiding this comment

le-horizon left a comment

Choose a reason for hiding this comment

le-horizon Jul 1, 2022

Choose a reason for hiding this comment