[BugFix] skip_done_states in SAC #2613

vmoens · 2024-11-27T15:12:19Z

Stack from ghstack (oldest at bottom):

-> [BugFix] skip_done_states in SAC #2613

[ghstack-poisoned]

ghstack-source-id: f534c53d30af035edb2e3b5291d4db71313086fd Pull Request resolved: #2613

pytorch-bot · 2024-11-27T15:12:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2613

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 8 Unrelated Failures

As of commit ca35b99 with merge base 90c8e40 ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
FAILED ../../../../../../tmp/test_objectives_benchmarks.py::test_sac_speed[True-None] - torch._dynamo.exc.Unsupported: Graph break under GenericContextWrappingVariable
SOTA Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t 26df6eb358cb2a89453f627d7c8d30549aa8d3ba5a32188d2d47102e68a1e429 /exec failed with exit code 1
Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job (gh)
RuntimeError: Command docker exec -t b95af722ca1f1258d7468c4dffc912751bd357113dd1a5bb43eb06a2085e184f /exec failed with exit code 1
Unit-tests on Windows / unittests-cpu / windows-job (gh)
##[error]fatal: couldn't find remote ref refs/pull/2613/merge

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Habitat Tests on Linux / tests (3.9, 12.1) / linux-job (gh) (trunk failure)
AttributeError: _ARRAY_API not found
Unit-tests on Linux / tests-cpu (3.10) / linux-job (gh) (trunk failure)
test/test_loggers.py::TestMLFlowLogger::test_log_video[steps1]
Unit-tests on Linux / tests-cpu (3.11) / linux-job (gh) (trunk failure)
test/test_loggers.py::TestMLFlowLogger::test_log_video[steps1]
Unit-tests on Linux / tests-cpu (3.12) / linux-job (gh) (trunk failure)
test/test_loggers.py::TestMLFlowLogger::test_log_video[steps1]
Unit-tests on Linux / tests-cpu (3.9) / linux-job (gh) (trunk failure)
test/test_loggers.py::TestMLFlowLogger::test_log_video[steps1]
Unit-tests on Linux / tests-cpu-oldget (3.12) / linux-job (gh) (trunk failure)
test/test_loggers.py::TestMLFlowLogger::test_log_video[steps1]
Unit-tests on Linux / tests-gpu (3.11, 12.1) / linux-job (gh) (trunk failure)
test/test_loggers.py::TestMLFlowLogger::test_log_video[steps1]
Unit-tests on Linux / tests-stable-gpu (3.10, 11.8) / linux-job (gh) (trunk failure)
test/test_loggers.py::TestMLFlowLogger::test_log_video[steps1]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

matteobettini

LGTM thanks!

One other option to consider to avoid changing the input shape in case the user sets this flag is to ask what to use to pad the non-terminated obs

matteobettini · 2024-11-27T16:10:47Z

torchrl/objectives/sac.py

@@ -126,6 +126,10 @@ class SACLoss(LossModule):
            ``"none"`` | ``"mean"`` | ``"sum"``. ``"none"``: no reduction will be applied,
            ``"mean"``: the sum of the output will be divided by the number of
            elements in the output, ``"sum"``: the output will be summed. Default: ``"mean"``.
+        skip_done_states (bool, optional): whether the actor network should only be run on valid, non-terminating


Suggested change

skip_done_states (bool, optional): whether the actor network should only be run on valid, non-terminating

skip_done_states (bool, optional): whether the actor network used for value computation should only be run on valid, non-terminating

matteobettini · 2024-11-27T16:15:53Z

torchrl/objectives/sac.py

@@ -877,6 +891,10 @@ class DiscreteSACLoss(LossModule):
            ``"none"`` | ``"mean"`` | ``"sum"``. ``"none"``: no reduction will be applied,
            ``"mean"``: the sum of the output will be divided by the number of
            elements in the output, ``"sum"``: the output will be summed. Default: ``"mean"``.
+        skip_done_states (bool, optional): whether the actor network should only be run on valid, non-terminating


same as above

vmoens · 2024-11-27T16:24:40Z

LGTM thanks!

One other option to consider to avoid changing the input shape in case the user sets this flag is to ask what to use to pad the non-terminated obs

See this comment

matteobettini · 2024-11-27T16:27:56Z

LGTM thanks!
One other option to consider to avoid changing the input shape in case the user sets this flag is to ask what to use to pad the non-terminated obs

See this comment

Yes I saw. I was referring exactly to that. Maybe there are users who have such issue but also need an input with the same shape (cholesky expects a matrix for isntance). in that case they might know what works for them (maybe NaN and 0 no but 1 yes)

Just an idea, we don't need to do it

vmoens · 2024-11-27T16:29:10Z

Yes I saw. I was referring exactly to that. Maybe there are users who have such issue but also need an input with the same shape (cholesky expects a matrix for isntance). in that case they might know what works for them (maybe nana and 0 no but 1 yes)

with cholesky (if we want to take that example), any fixed number filling the matrix will fail. Padding is simply not a solution unfortunately.

matteobettini · 2024-11-27T16:32:09Z

ok got it

[ghstack-poisoned]

ghstack-source-id: 39d97360e3b0e45dd8c327487eac50ddafe2254d Pull Request resolved: #2613

github-actions · 2024-12-02T18:18:26Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_simple	0.7652s	0.7553s	1.3239 Ops/s	1.2894 Ops/s	$\color{#35bf28}+2.68\%$
test_transformed	1.1032s	1.0231s	0.9774 Ops/s	0.9781 Ops/s	$\color{#d91a1a}-0.08\%$
test_serial	2.2471s	2.1677s	0.4613 Ops/s	0.4565 Ops/s	$\color{#35bf28}+1.05\%$
test_parallel	2.0935s	2.0028s	0.4993 Ops/s	0.4981 Ops/s	$\color{#35bf28}+0.24\%$
test_step_mdp_speed[True-True-True-True-True]	0.1914ms	38.8791μs	25.7207 KOps/s	25.6733 KOps/s	$\color{#35bf28}+0.18\%$
test_step_mdp_speed[True-True-True-True-False]	54.2600μs	22.6428μs	44.1642 KOps/s	44.2796 KOps/s	$\color{#d91a1a}-0.26\%$
test_step_mdp_speed[True-True-True-False-True]	54.7010μs	21.5534μs	46.3964 KOps/s	46.4754 KOps/s	$\color{#d91a1a}-0.17\%$
test_step_mdp_speed[True-True-True-False-False]	40.7810μs	12.5305μs	79.8051 KOps/s	79.3600 KOps/s	$\color{#35bf28}+0.56\%$
test_step_mdp_speed[True-True-False-True-True]	79.1610μs	41.1514μs	24.3005 KOps/s	23.9126 KOps/s	$\color{#35bf28}+1.62\%$
test_step_mdp_speed[True-True-False-True-False]	59.2310μs	24.2130μs	41.3002 KOps/s	40.6053 KOps/s	$\color{#35bf28}+1.71\%$
test_step_mdp_speed[True-True-False-False-True]	59.8610μs	23.5655μs	42.4349 KOps/s	41.6182 KOps/s	$\color{#35bf28}+1.96\%$
test_step_mdp_speed[True-True-False-False-False]	49.7710μs	14.5749μs	68.6113 KOps/s	68.9127 KOps/s	$\color{#d91a1a}-0.44\%$
test_step_mdp_speed[True-False-True-True-True]	95.3210μs	43.7812μs	22.8408 KOps/s	22.3496 KOps/s	$\color{#35bf28}+2.20\%$
test_step_mdp_speed[True-False-True-True-False]	68.4310μs	26.5952μs	37.6008 KOps/s	37.6781 KOps/s	$\color{#d91a1a}-0.21\%$
test_step_mdp_speed[True-False-True-False-True]	54.5300μs	23.6229μs	42.3318 KOps/s	41.8566 KOps/s	$\color{#35bf28}+1.14\%$
test_step_mdp_speed[True-False-True-False-False]	50.3010μs	14.5841μs	68.5678 KOps/s	68.0094 KOps/s	$\color{#35bf28}+0.82\%$
test_step_mdp_speed[True-False-False-True-True]	76.4110μs	45.3332μs	22.0589 KOps/s	21.6110 KOps/s	$\color{#35bf28}+2.07\%$
test_step_mdp_speed[True-False-False-True-False]	62.8310μs	28.8313μs	34.6845 KOps/s	35.7247 KOps/s	$\color{#d91a1a}-2.91\%$
test_step_mdp_speed[True-False-False-False-True]	57.6610μs	26.0519μs	38.3849 KOps/s	38.2054 KOps/s	$\color{#35bf28}+0.47\%$
test_step_mdp_speed[True-False-False-False-False]	57.4710μs	16.6030μs	60.2301 KOps/s	59.3653 KOps/s	$\color{#35bf28}+1.46\%$
test_step_mdp_speed[False-True-True-True-True]	79.0710μs	43.6489μs	22.9101 KOps/s	22.4352 KOps/s	$\color{#35bf28}+2.12\%$
test_step_mdp_speed[False-True-True-True-False]	54.0810μs	26.7376μs	37.4006 KOps/s	37.2285 KOps/s	$\color{#35bf28}+0.46\%$
test_step_mdp_speed[False-True-True-False-True]	60.1110μs	27.4161μs	36.4749 KOps/s	36.0821 KOps/s	$\color{#35bf28}+1.09\%$
test_step_mdp_speed[False-True-True-False-False]	47.1310μs	16.2336μs	61.6007 KOps/s	60.0473 KOps/s	$\color{#35bf28}+2.59\%$
test_step_mdp_speed[False-True-False-True-True]	82.1110μs	44.7407μs	22.3510 KOps/s	21.6010 KOps/s	$\color{#35bf28}+3.47\%$
test_step_mdp_speed[False-True-False-True-False]	53.7310μs	28.5671μs	35.0053 KOps/s	34.9029 KOps/s	$\color{#35bf28}+0.29\%$
test_step_mdp_speed[False-True-False-False-True]	3.2315ms	30.2651μs	33.0414 KOps/s	32.6098 KOps/s	$\color{#35bf28}+1.32\%$
test_step_mdp_speed[False-True-False-False-False]	46.9410μs	18.8571μs	53.0304 KOps/s	52.7365 KOps/s	$\color{#35bf28}+0.56\%$
test_step_mdp_speed[False-False-True-True-True]	94.0310μs	48.0632μs	20.8060 KOps/s	20.4270 KOps/s	$\color{#35bf28}+1.86\%$
test_step_mdp_speed[False-False-True-True-False]	59.6110μs	31.0009μs	32.2571 KOps/s	31.8956 KOps/s	$\color{#35bf28}+1.13\%$
test_step_mdp_speed[False-False-True-False-True]	58.1600μs	29.3376μs	34.0859 KOps/s	33.3437 KOps/s	$\color{#35bf28}+2.23\%$
test_step_mdp_speed[False-False-True-False-False]	44.9910μs	18.7105μs	53.4458 KOps/s	53.9206 KOps/s	$\color{#d91a1a}-0.88\%$
test_step_mdp_speed[False-False-False-True-True]	0.1059ms	49.3715μs	20.2546 KOps/s	20.0727 KOps/s	$\color{#35bf28}+0.91\%$
test_step_mdp_speed[False-False-False-True-False]	69.5010μs	32.9365μs	30.3614 KOps/s	30.0191 KOps/s	$\color{#35bf28}+1.14\%$
test_step_mdp_speed[False-False-False-False-True]	99.6110μs	30.1068μs	33.2151 KOps/s	32.1528 KOps/s	$\color{#35bf28}+3.30\%$
test_step_mdp_speed[False-False-False-False-False]	46.7600μs	20.2617μs	49.3542 KOps/s	48.7690 KOps/s	$\color{#35bf28}+1.20\%$
test_values[generalized_advantage_estimate-True-True]	25.3723ms	24.6518ms	40.5649 Ops/s	39.4900 Ops/s	$\color{#35bf28}+2.72\%$
test_values[vec_generalized_advantage_estimate-True-True]	0.1089s	3.0821ms	324.4508 Ops/s	332.5569 Ops/s	$\color{#d91a1a}-2.44\%$
test_values[td0_return_estimate-False-False]	0.1047ms	81.3424μs	12.2937 KOps/s	11.9892 KOps/s	$\color{#35bf28}+2.54\%$
test_values[td1_return_estimate-False-False]	55.7296ms	55.3054ms	18.0814 Ops/s	17.6776 Ops/s	$\color{#35bf28}+2.28\%$
test_values[vec_td1_return_estimate-False-False]	1.2778ms	1.0872ms	919.7733 Ops/s	910.4126 Ops/s	$\color{#35bf28}+1.03\%$
test_values[td_lambda_return_estimate-True-False]	93.8858ms	89.5857ms	11.1625 Ops/s	11.1335 Ops/s	$\color{#35bf28}+0.26\%$
test_values[vec_td_lambda_return_estimate-True-False]	1.2459ms	1.0785ms	927.2187 Ops/s	908.9589 Ops/s	$\color{#35bf28}+2.01\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	24.8364ms	24.5379ms	40.7532 Ops/s	39.3232 Ops/s	$\color{#35bf28}+3.64\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	1.0948ms	0.7638ms	1.3092 KOps/s	1.2694 KOps/s	$\color{#35bf28}+3.13\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.7730ms	0.6741ms	1.4836 KOps/s	1.4605 KOps/s	$\color{#35bf28}+1.58\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	1.5398ms	1.4902ms	671.0630 Ops/s	665.5950 Ops/s	$\color{#35bf28}+0.82\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	0.7418ms	0.6877ms	1.4542 KOps/s	1.4276 KOps/s	$\color{#35bf28}+1.86\%$
test_dqn_speed[False-None]	1.6535ms	1.4847ms	673.5363 Ops/s	668.1360 Ops/s	$\color{#35bf28}+0.81\%$
test_dqn_speed[False-backward]	2.1300ms	2.0844ms	479.7429 Ops/s	474.4762 Ops/s	$\color{#35bf28}+1.11\%$
test_dqn_speed[True-None]	0.6585ms	0.5547ms	1.8026 KOps/s	1.8056 KOps/s	$\color{#d91a1a}-0.16\%$
test_dqn_speed[True-backward]	1.2751ms	1.2011ms	832.5669 Ops/s	825.2040 Ops/s	$\color{#35bf28}+0.89\%$
test_dqn_speed[reduce-overhead-None]	0.6067ms	0.5448ms	1.8356 KOps/s	1.8100 KOps/s	$\color{#35bf28}+1.42\%$
test_dqn_speed[reduce-overhead-backward]	1.1115ms	1.0717ms	933.1347 Ops/s	934.5016 Ops/s	$\color{#d91a1a}-0.15\%$
test_ddpg_speed[False-None]	3.1539ms	2.8352ms	352.7059 Ops/s	349.5588 Ops/s	$\color{#35bf28}+0.90\%$
test_ddpg_speed[False-backward]	4.5877ms	4.1790ms	239.2923 Ops/s	236.5466 Ops/s	$\color{#35bf28}+1.16\%$
test_ddpg_speed[True-None]	1.1639ms	1.0822ms	924.0513 Ops/s	912.1166 Ops/s	$\color{#35bf28}+1.31\%$
test_ddpg_speed[True-backward]	2.3917ms	2.2916ms	436.3856 Ops/s	431.9472 Ops/s	$\color{#35bf28}+1.03\%$
test_ddpg_speed[reduce-overhead-None]	1.1891ms	1.0887ms	918.5485 Ops/s	905.7902 Ops/s	$\color{#35bf28}+1.41\%$
test_ddpg_speed[reduce-overhead-backward]	1.8551ms	1.7729ms	564.0477 Ops/s	561.6089 Ops/s	$\color{#35bf28}+0.43\%$
test_sac_speed[False-None]	8.5395ms	8.0090ms	124.8596 Ops/s	124.3773 Ops/s	$\color{#35bf28}+0.39\%$
test_sac_speed[False-backward]	11.9676ms	11.2685ms	88.7427 Ops/s	88.3798 Ops/s	$\color{#35bf28}+0.41\%$
test_sac_speed[True-None]	1.6197ms	1.5364ms	650.8829 Ops/s	638.0909 Ops/s	$\color{#35bf28}+2.00\%$
test_sac_speed[True-backward]	3.4872ms	3.3833ms	295.5656 Ops/s	308.3820 Ops/s	$\color{#d91a1a}-4.16\%$
test_sac_speed[reduce-overhead-None]	22.6726ms	12.5724ms	79.5395 Ops/s	79.1271 Ops/s	$\color{#35bf28}+0.52\%$
test_sac_speed[reduce-overhead-backward]	1.3706ms	1.3321ms	750.7083 Ops/s	663.6652 Ops/s	$\textbf{\color{#35bf28}+13.12\%}$
test_redq_speed[False-None]	8.3654ms	7.5101ms	133.1543 Ops/s	132.4374 Ops/s	$\color{#35bf28}+0.54\%$
test_redq_speed[False-backward]	12.2516ms	11.3818ms	87.8596 Ops/s	85.3266 Ops/s	$\color{#35bf28}+2.97\%$
test_redq_speed[True-None]	2.1227ms	1.9787ms	505.3795 Ops/s	494.4089 Ops/s	$\color{#35bf28}+2.22\%$
test_redq_speed[True-backward]	3.9850ms	3.6727ms	272.2795 Ops/s	258.2477 Ops/s	$\textbf{\color{#35bf28}+5.43\%}$
test_redq_speed[reduce-overhead-None]	2.4382ms	2.0203ms	494.9789 Ops/s	491.9430 Ops/s	$\color{#35bf28}+0.62\%$
test_redq_speed[reduce-overhead-backward]	3.9931ms	3.8307ms	261.0518 Ops/s	257.7229 Ops/s	$\color{#35bf28}+1.29\%$
test_redq_deprec_speed[False-None]	9.7106ms	9.0734ms	110.2125 Ops/s	109.4280 Ops/s	$\color{#35bf28}+0.72\%$
test_redq_deprec_speed[False-backward]	12.8345ms	12.3183ms	81.1798 Ops/s	80.2326 Ops/s	$\color{#35bf28}+1.18\%$
test_redq_deprec_speed[True-None]	2.4248ms	2.3212ms	430.8137 Ops/s	426.4062 Ops/s	$\color{#35bf28}+1.03\%$
test_redq_deprec_speed[True-backward]	4.4280ms	3.9694ms	251.9302 Ops/s	234.6653 Ops/s	$\textbf{\color{#35bf28}+7.36\%}$
test_redq_deprec_speed[reduce-overhead-None]	2.4611ms	2.3499ms	425.5517 Ops/s	427.5478 Ops/s	$\color{#d91a1a}-0.47\%$
test_redq_deprec_speed[reduce-overhead-backward]	4.1676ms	3.9844ms	250.9758 Ops/s	249.0685 Ops/s	$\color{#35bf28}+0.77\%$
test_td3_speed[False-None]	8.0724ms	7.9015ms	126.5585 Ops/s	126.9135 Ops/s	$\color{#d91a1a}-0.28\%$
test_td3_speed[False-backward]	10.7659ms	10.2450ms	97.6090 Ops/s	97.3983 Ops/s	$\color{#35bf28}+0.22\%$
test_td3_speed[True-None]	1.6318ms	1.5690ms	637.3579 Ops/s	631.5007 Ops/s	$\color{#35bf28}+0.93\%$
test_td3_speed[True-backward]	3.1570ms	3.0854ms	324.1076 Ops/s	299.0821 Ops/s	$\textbf{\color{#35bf28}+8.37\%}$
test_td3_speed[reduce-overhead-None]	49.9059ms	25.5076ms	39.2040 Ops/s	37.1109 Ops/s	$\textbf{\color{#35bf28}+5.64\%}$
test_td3_speed[reduce-overhead-backward]	1.4975ms	1.4312ms	698.7146 Ops/s	688.9217 Ops/s	$\color{#35bf28}+1.42\%$
test_cql_speed[False-None]	16.7632ms	16.1849ms	61.7861 Ops/s	61.6881 Ops/s	$\color{#35bf28}+0.16\%$
test_cql_speed[False-backward]	22.5588ms	21.6941ms	46.0955 Ops/s	45.7940 Ops/s	$\color{#35bf28}+0.66\%$
test_cql_speed[True-None]	3.0280ms	2.9005ms	344.7666 Ops/s	340.2673 Ops/s	$\color{#35bf28}+1.32\%$
test_cql_speed[True-backward]	5.4654ms	5.0503ms	198.0084 Ops/s	187.1512 Ops/s	$\textbf{\color{#35bf28}+5.80\%}$
test_cql_speed[reduce-overhead-None]	21.2693ms	12.9136ms	77.4377 Ops/s	75.9748 Ops/s	$\color{#35bf28}+1.93\%$
test_cql_speed[reduce-overhead-backward]	1.6310ms	1.5228ms	656.6881 Ops/s	598.8102 Ops/s	$\textbf{\color{#35bf28}+9.67\%}$
test_a2c_speed[False-None]	3.4666ms	3.2634ms	306.4274 Ops/s	313.2432 Ops/s	$\color{#d91a1a}-2.18\%$
test_a2c_speed[False-backward]	6.5706ms	6.0963ms	164.0345 Ops/s	155.8410 Ops/s	$\textbf{\color{#35bf28}+5.26\%}$
test_a2c_speed[True-None]	1.1242ms	1.0042ms	995.8327 Ops/s	995.8662 Ops/s	$-0.00\%$
test_a2c_speed[True-backward]	3.1377ms	2.6497ms	377.4075 Ops/s	359.4725 Ops/s	$\color{#35bf28}+4.99\%$
test_a2c_speed[reduce-overhead-None]	0.3875s	12.1683ms	82.1808 Ops/s	86.1405 Ops/s	$\color{#d91a1a}-4.60\%$
test_a2c_speed[reduce-overhead-backward]	1.0488ms	1.0020ms	997.9842 Ops/s	879.6677 Ops/s	$\textbf{\color{#35bf28}+13.45\%}$
test_ppo_speed[False-None]	3.9590ms	3.7063ms	269.8077 Ops/s	273.3494 Ops/s	$\color{#d91a1a}-1.30\%$
test_ppo_speed[False-backward]	7.3167ms	6.8630ms	145.7080 Ops/s	140.3488 Ops/s	$\color{#35bf28}+3.82\%$
test_ppo_speed[True-None]	1.0941ms	0.9826ms	1.0178 KOps/s	1.0472 KOps/s	$\color{#d91a1a}-2.81\%$
test_ppo_speed[True-backward]	2.6705ms	2.5878ms	386.4336 Ops/s	365.2383 Ops/s	$\textbf{\color{#35bf28}+5.80\%}$
test_ppo_speed[reduce-overhead-None]	0.5826ms	0.5064ms	1.9746 KOps/s	1.8820 KOps/s	$\color{#35bf28}+4.92\%$
test_ppo_speed[reduce-overhead-backward]	1.0191ms	0.9774ms	1.0231 KOps/s	1.0052 KOps/s	$\color{#35bf28}+1.77\%$
test_reinforce_speed[False-None]	2.3551ms	2.2373ms	446.9709 Ops/s	446.0566 Ops/s	$\color{#35bf28}+0.20\%$
test_reinforce_speed[False-backward]	3.7442ms	3.2807ms	304.8158 Ops/s	306.3419 Ops/s	$\color{#d91a1a}-0.50\%$
test_reinforce_speed[True-None]	0.9031ms	0.8312ms	1.2031 KOps/s	1.2054 KOps/s	$\color{#d91a1a}-0.18\%$
test_reinforce_speed[True-backward]	2.7069ms	2.4434ms	409.2706 Ops/s	406.7619 Ops/s	$\color{#35bf28}+0.62\%$
test_reinforce_speed[reduce-overhead-None]	21.5579ms	11.2594ms	88.8147 Ops/s	86.5388 Ops/s	$\color{#35bf28}+2.63\%$
test_reinforce_speed[reduce-overhead-backward]	1.1409ms	1.0564ms	946.6457 Ops/s	827.6562 Ops/s	$\textbf{\color{#35bf28}+14.38\%}$
test_iql_speed[False-None]	9.6872ms	9.1946ms	108.7591 Ops/s	108.0511 Ops/s	$\color{#35bf28}+0.66\%$
test_iql_speed[False-backward]	13.5821ms	12.9820ms	77.0297 Ops/s	75.8605 Ops/s	$\color{#35bf28}+1.54\%$
test_iql_speed[True-None]	1.9662ms	1.8195ms	549.6009 Ops/s	573.1540 Ops/s	$\color{#d91a1a}-4.11\%$
test_iql_speed[True-backward]	4.6591ms	4.2157ms	237.2068 Ops/s	227.3062 Ops/s	$\color{#35bf28}+4.36\%$
test_iql_speed[reduce-overhead-None]	19.9842ms	11.2116ms	89.1930 Ops/s	108.5114 Ops/s	$\textbf{\color{#d91a1a}-17.80\%}$
test_iql_speed[reduce-overhead-backward]	1.5073ms	1.4338ms	697.4425 Ops/s	631.4249 Ops/s	$\textbf{\color{#35bf28}+10.46\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	7.7746ms	6.4184ms	155.8024 Ops/s	152.2100 Ops/s	$\color{#35bf28}+2.36\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.4893ms	0.2729ms	3.6645 KOps/s	3.2571 KOps/s	$\textbf{\color{#35bf28}+12.51\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.5441ms	0.2621ms	3.8150 KOps/s	3.2797 KOps/s	$\textbf{\color{#35bf28}+16.32\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	6.5368ms	6.1617ms	162.2936 Ops/s	158.1911 Ops/s	$\color{#35bf28}+2.59\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	2.0892ms	0.2575ms	3.8829 KOps/s	3.8355 KOps/s	$\color{#35bf28}+1.24\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.6389ms	0.2380ms	4.2019 KOps/s	4.3007 KOps/s	$\color{#d91a1a}-2.30\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000]	1.4546ms	1.2562ms	796.0437 Ops/s	778.8931 Ops/s	$\color{#35bf28}+2.20\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000]	1.4067ms	1.2001ms	833.2796 Ops/s	814.1188 Ops/s	$\color{#35bf28}+2.35\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	6.5559ms	6.3519ms	157.4335 Ops/s	155.6570 Ops/s	$\color{#35bf28}+1.14\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	2.1173ms	0.4106ms	2.4355 KOps/s	2.3655 KOps/s	$\color{#35bf28}+2.96\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.6190ms	0.4160ms	2.4040 KOps/s	2.3862 KOps/s	$\color{#35bf28}+0.75\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	6.4755ms	6.2570ms	159.8210 Ops/s	158.7529 Ops/s	$\color{#35bf28}+0.67\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.8898ms	0.3893ms	2.5684 KOps/s	3.5643 KOps/s	$\textbf{\color{#d91a1a}-27.94\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.5904ms	0.3272ms	3.0564 KOps/s	3.1658 KOps/s	$\color{#d91a1a}-3.46\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	6.4890ms	6.1704ms	162.0637 Ops/s	159.9331 Ops/s	$\color{#35bf28}+1.33\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	1.6561ms	0.3412ms	2.9305 KOps/s	3.9182 KOps/s	$\textbf{\color{#d91a1a}-25.21\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.5232ms	0.2802ms	3.5683 KOps/s	3.8117 KOps/s	$\textbf{\color{#d91a1a}-6.39\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	6.7141ms	6.3711ms	156.9585 Ops/s	154.2934 Ops/s	$\color{#35bf28}+1.73\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.1723ms	0.4859ms	2.0579 KOps/s	2.3956 KOps/s	$\textbf{\color{#d91a1a}-14.10\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.6456ms	0.4627ms	2.1614 KOps/s	2.3658 KOps/s	$\textbf{\color{#d91a1a}-8.64\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	6.9862ms	5.3032ms	188.5646 Ops/s	190.0000 Ops/s	$\color{#d91a1a}-0.76\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	10.5415ms	2.0797ms	480.8469 Ops/s	429.4410 Ops/s	$\textbf{\color{#35bf28}+11.97\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	6.1904ms	1.1926ms	838.4690 Ops/s	829.4628 Ops/s	$\color{#35bf28}+1.09\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.4991s	15.1952ms	65.8101 Ops/s	191.7374 Ops/s	$\textbf{\color{#d91a1a}-65.68\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	6.8374ms	1.9908ms	502.3172 Ops/s	436.3876 Ops/s	$\textbf{\color{#35bf28}+15.11\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	8.7905ms	1.2712ms	786.6873 Ops/s	872.2520 Ops/s	$\textbf{\color{#d91a1a}-9.81\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	9.0016ms	5.5775ms	179.2929 Ops/s	33.3865 Ops/s	$\textbf{\color{#35bf28}+437.02\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	9.0766ms	2.2041ms	453.7087 Ops/s	473.1849 Ops/s	$\color{#d91a1a}-4.12\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	8.2483ms	1.3959ms	716.4038 Ops/s	731.0940 Ops/s	$\color{#d91a1a}-2.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True]	13.3608ms	13.0998ms	76.3373 Ops/s	75.8350 Ops/s	$\color{#35bf28}+0.66\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False]	18.8224ms	17.2347ms	58.0225 Ops/s	59.0189 Ops/s	$\color{#d91a1a}-1.69\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True]	18.0168ms	17.7324ms	56.3939 Ops/s	54.7506 Ops/s	$\color{#35bf28}+3.00\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False]	17.5108ms	16.9293ms	59.0690 Ops/s	58.2699 Ops/s	$\color{#35bf28}+1.37\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True]	18.3592ms	17.6483ms	56.6627 Ops/s	55.9090 Ops/s	$\color{#35bf28}+1.35\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False]	18.6999ms	18.0007ms	55.5535 Ops/s	54.4755 Ops/s	$\color{#35bf28}+1.98\%$

ghstack-source-id: 39d97360e3b0e45dd8c327487eac50ddafe2254d Pull Request resolved: #2613

Update

58c4d6a

[ghstack-poisoned]

vmoens added a commit that referenced this pull request Nov 27, 2024

[BugFix] skip_done_states in SAC

8078906

ghstack-source-id: f534c53d30af035edb2e3b5291d4db71313086fd Pull Request resolved: #2613

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 27, 2024

vmoens mentioned this pull request Nov 27, 2024

[BUG] SAC loss masking #2612

Closed

matteobettini approved these changes Nov 27, 2024

View reviewed changes

vmoens added the bug Something isn't working label Nov 27, 2024

Update

ca35b99

[ghstack-poisoned]

vmoens added a commit that referenced this pull request Dec 2, 2024

[BugFix] skip_done_states in SAC

1db9558

ghstack-source-id: 39d97360e3b0e45dd8c327487eac50ddafe2254d Pull Request resolved: #2613

vmoens merged commit ca35b99 into gh/vmoens/43/base Dec 2, 2024
61 of 73 checks passed

vmoens added a commit that referenced this pull request Dec 2, 2024

[BugFix] skip_done_states in SAC

de61e4d

ghstack-source-id: 39d97360e3b0e45dd8c327487eac50ddafe2254d Pull Request resolved: #2613

vmoens deleted the gh/vmoens/43/head branch December 2, 2024 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] skip_done_states in SAC #2613

[BugFix] skip_done_states in SAC #2613

vmoens commented Nov 27, 2024 •

edited

Loading

pytorch-bot bot commented Nov 27, 2024 •

edited

Loading

matteobettini left a comment

matteobettini Nov 27, 2024

matteobettini Nov 27, 2024

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024 •

edited

Loading

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024

github-actions bot commented Dec 2, 2024

	skip_done_states (bool, optional): whether the actor network should only be run on valid, non-terminating
	skip_done_states (bool, optional): whether the actor network used for value computation should only be run on valid, non-terminating

[BugFix] skip_done_states in SAC #2613

[BugFix] skip_done_states in SAC #2613

Conversation

vmoens commented Nov 27, 2024 • edited Loading

pytorch-bot bot commented Nov 27, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2613

❌ 4 New Failures, 8 Unrelated Failures

matteobettini left a comment

Choose a reason for hiding this comment

matteobettini Nov 27, 2024

Choose a reason for hiding this comment

matteobettini Nov 27, 2024

Choose a reason for hiding this comment

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024 • edited Loading

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024

github-actions bot commented Dec 2, 2024

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}8$.

vmoens commented Nov 27, 2024 •

edited

Loading

pytorch-bot bot commented Nov 27, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024 •

edited

Loading

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests