Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] CQL compatibility with compile #2553

Merged
merged 35 commits into from
Dec 13, 2024
Merged

Conversation

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2553

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 5 Cancelled Jobs, 8 Unrelated Failures

As of commit fe4f5c7 with merge base e3c3047 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 12, 2024
vmoens added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: 6bfb32c1e9647bd82cf72424602431da898fd81a
Pull Request resolved: #2553
@vmoens vmoens added enhancement New feature or request performance Performance issue or suggestion for improvement labels Nov 12, 2024
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Copy link

github-actions bot commented Dec 13, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}32$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.4193s 0.4179s 2.3931 Ops/s 2.2038 Ops/s $\textbf{\color{#35bf28}+8.59\%}$
test_transformed 0.6037s 0.5999s 1.6668 Ops/s 1.5874 Ops/s $\textbf{\color{#35bf28}+5.00\%}$
test_serial 1.3336s 1.3293s 0.7522 Ops/s 0.7309 Ops/s $\color{#35bf28}+2.92\%$
test_parallel 1.2830s 1.2748s 0.7844 Ops/s 0.7522 Ops/s $\color{#35bf28}+4.28\%$
test_step_mdp_speed[True-True-True-True-True] 0.1974ms 30.1074μs 33.2144 KOps/s 33.4170 KOps/s $\color{#d91a1a}-0.61\%$
test_step_mdp_speed[True-True-True-True-False] 97.4130μs 18.0392μs 55.4348 KOps/s 55.2760 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-True-True-False-True] 0.1746ms 17.4273μs 57.3812 KOps/s 57.5942 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[True-True-True-False-False] 32.6110μs 10.0363μs 99.6380 KOps/s 95.8305 KOps/s $\color{#35bf28}+3.97\%$
test_step_mdp_speed[True-True-False-True-True] 88.6460μs 32.0803μs 31.1718 KOps/s 30.7785 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[True-True-False-True-False] 52.4480μs 19.5440μs 51.1667 KOps/s 50.4025 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[True-True-False-False-True] 71.6440μs 18.6691μs 53.5645 KOps/s 51.4333 KOps/s $\color{#35bf28}+4.14\%$
test_step_mdp_speed[True-True-False-False-False] 45.7960μs 11.7498μs 85.1076 KOps/s 82.5928 KOps/s $\color{#35bf28}+3.04\%$
test_step_mdp_speed[True-False-True-True-True] 97.5730μs 33.2956μs 30.0340 KOps/s 29.3777 KOps/s $\color{#35bf28}+2.23\%$
test_step_mdp_speed[True-False-True-True-False] 75.3710μs 21.0286μs 47.5542 KOps/s 46.7485 KOps/s $\color{#35bf28}+1.72\%$
test_step_mdp_speed[True-False-True-False-True] 0.1162ms 18.7316μs 53.3856 KOps/s 52.6277 KOps/s $\color{#35bf28}+1.44\%$
test_step_mdp_speed[True-False-True-False-False] 65.2730μs 11.8836μs 84.1497 KOps/s 82.9850 KOps/s $\color{#35bf28}+1.40\%$
test_step_mdp_speed[True-False-False-True-True] 77.9060μs 35.5675μs 28.1156 KOps/s 26.5722 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_step_mdp_speed[True-False-False-True-False] 68.1780μs 23.1485μs 43.1993 KOps/s 43.1719 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[True-False-False-False-True] 62.6470μs 20.4584μs 48.8796 KOps/s 48.1855 KOps/s $\color{#35bf28}+1.44\%$
test_step_mdp_speed[True-False-False-False-False] 63.4390μs 13.4738μs 74.2180 KOps/s 70.4808 KOps/s $\textbf{\color{#35bf28}+5.30\%}$
test_step_mdp_speed[False-True-True-True-True] 0.1079ms 33.9776μs 29.4312 KOps/s 29.2485 KOps/s $\color{#35bf28}+0.62\%$
test_step_mdp_speed[False-True-True-True-False] 64.3710μs 21.0083μs 47.6003 KOps/s 46.7390 KOps/s $\color{#35bf28}+1.84\%$
test_step_mdp_speed[False-True-True-False-True] 69.8110μs 21.3099μs 46.9265 KOps/s 46.4353 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[False-True-True-False-False] 36.5380μs 13.3442μs 74.9389 KOps/s 75.9948 KOps/s $\color{#d91a1a}-1.39\%$
test_step_mdp_speed[False-True-False-True-True] 89.7880μs 35.2952μs 28.3325 KOps/s 28.1194 KOps/s $\color{#35bf28}+0.76\%$
test_step_mdp_speed[False-True-False-True-False] 56.1850μs 23.1052μs 43.2802 KOps/s 43.0248 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[False-True-False-False-True] 2.4776ms 22.8869μs 43.6932 KOps/s 41.4355 KOps/s $\textbf{\color{#35bf28}+5.45\%}$
test_step_mdp_speed[False-True-False-False-False] 71.2420μs 14.6930μs 68.0597 KOps/s 66.6880 KOps/s $\color{#35bf28}+2.06\%$
test_step_mdp_speed[False-False-True-True-True] 76.4640μs 36.6637μs 27.2750 KOps/s 26.9029 KOps/s $\color{#35bf28}+1.38\%$
test_step_mdp_speed[False-False-True-True-False] 73.8990μs 24.7312μs 40.4348 KOps/s 40.4144 KOps/s $\color{#35bf28}+0.05\%$
test_step_mdp_speed[False-False-True-False-True] 58.2300μs 23.0745μs 43.3379 KOps/s 42.6178 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[False-False-True-False-False] 41.2380μs 14.7253μs 67.9104 KOps/s 66.7578 KOps/s $\color{#35bf28}+1.73\%$
test_step_mdp_speed[False-False-False-True-True] 0.1195ms 38.5160μs 25.9632 KOps/s 26.1104 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[False-False-False-True-False] 62.4870μs 26.4103μs 37.8641 KOps/s 38.4203 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[False-False-False-False-True] 62.9790μs 24.6124μs 40.6299 KOps/s 40.6967 KOps/s $\color{#d91a1a}-0.16\%$
test_step_mdp_speed[False-False-False-False-False] 64.4290μs 16.4400μs 60.8272 KOps/s 60.6598 KOps/s $\color{#35bf28}+0.28\%$
test_values[generalized_advantage_estimate-True-True] 11.4511ms 9.5525ms 104.6842 Ops/s 103.8225 Ops/s $\color{#35bf28}+0.83\%$
test_values[vec_generalized_advantage_estimate-True-True] 38.3668ms 35.3826ms 28.2624 Ops/s 28.2018 Ops/s $\color{#35bf28}+0.22\%$
test_values[td0_return_estimate-False-False] 0.2415ms 0.1730ms 5.7808 KOps/s 5.6095 KOps/s $\color{#35bf28}+3.05\%$
test_values[td1_return_estimate-False-False] 24.2466ms 23.8840ms 41.8690 Ops/s 42.1874 Ops/s $\color{#d91a1a}-0.75\%$
test_values[vec_td1_return_estimate-False-False] 37.3273ms 35.4083ms 28.2420 Ops/s 28.2002 Ops/s $\color{#35bf28}+0.15\%$
test_values[td_lambda_return_estimate-True-False] 37.6590ms 34.2105ms 29.2308 Ops/s 28.9116 Ops/s $\color{#35bf28}+1.10\%$
test_values[vec_td_lambda_return_estimate-True-False] 38.2272ms 35.5154ms 28.1568 Ops/s 28.1459 Ops/s $\color{#35bf28}+0.04\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.9350ms 8.3176ms 120.2272 Ops/s 120.2043 Ops/s $\color{#35bf28}+0.02\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.4452ms 2.0154ms 496.1817 Ops/s 497.4049 Ops/s $\color{#d91a1a}-0.25\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5009ms 0.3498ms 2.8590 KOps/s 2.7312 KOps/s $\color{#35bf28}+4.68\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 45.5979ms 44.5272ms 22.4582 Ops/s 20.7396 Ops/s $\textbf{\color{#35bf28}+8.29\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.0123ms 3.0416ms 328.7719 Ops/s 323.9056 Ops/s $\color{#35bf28}+1.50\%$
test_dqn_speed[False-None] 5.3866ms 1.3638ms 733.2502 Ops/s 721.8137 Ops/s $\color{#35bf28}+1.58\%$
test_dqn_speed[False-backward] 1.9211ms 1.8324ms 545.7386 Ops/s 512.0593 Ops/s $\textbf{\color{#35bf28}+6.58\%}$
test_dqn_speed[True-None] 0.6580ms 0.4604ms 2.1722 KOps/s 2.0951 KOps/s $\color{#35bf28}+3.68\%$
test_dqn_speed[True-backward] 0.9162ms 0.8663ms 1.1543 KOps/s 1.1258 KOps/s $\color{#35bf28}+2.53\%$
test_dqn_speed[reduce-overhead-None] 0.6210ms 0.4607ms 2.1707 KOps/s 2.1288 KOps/s $\color{#35bf28}+1.97\%$
test_dqn_speed[reduce-overhead-backward] 0.9276ms 0.8768ms 1.1406 KOps/s 1.1111 KOps/s $\color{#35bf28}+2.65\%$
test_ddpg_speed[False-None] 3.7193ms 2.8091ms 355.9858 Ops/s 350.5263 Ops/s $\color{#35bf28}+1.56\%$
test_ddpg_speed[False-backward] 4.2127ms 3.9288ms 254.5301 Ops/s 252.7669 Ops/s $\color{#35bf28}+0.70\%$
test_ddpg_speed[True-None] 1.4161ms 0.9996ms 1.0004 KOps/s 986.1874 Ops/s $\color{#35bf28}+1.44\%$
test_ddpg_speed[True-backward] 2.4534ms 1.9623ms 509.6173 Ops/s 474.0528 Ops/s $\textbf{\color{#35bf28}+7.50\%}$
test_ddpg_speed[reduce-overhead-None] 1.8625ms 1.0101ms 989.9908 Ops/s 977.3427 Ops/s $\color{#35bf28}+1.29\%$
test_ddpg_speed[reduce-overhead-backward] 1.9428ms 1.8813ms 531.5477 Ops/s 519.3389 Ops/s $\color{#35bf28}+2.35\%$
test_sac_speed[False-None] 8.5743ms 7.8549ms 127.3084 Ops/s 123.8295 Ops/s $\color{#35bf28}+2.81\%$
test_sac_speed[False-backward] 12.2793ms 10.5752ms 94.5604 Ops/s 92.6679 Ops/s $\color{#35bf28}+2.04\%$
test_sac_speed[True-None] 2.2727ms 1.8202ms 549.3900 Ops/s 540.4105 Ops/s $\color{#35bf28}+1.66\%$
test_sac_speed[True-backward] 3.6491ms 3.5541ms 281.3683 Ops/s 284.6648 Ops/s $\color{#d91a1a}-1.16\%$
test_sac_speed[reduce-overhead-None] 2.4057ms 1.8401ms 543.4578 Ops/s 544.7345 Ops/s $\color{#d91a1a}-0.23\%$
test_sac_speed[reduce-overhead-backward] 3.5929ms 3.5269ms 283.5352 Ops/s 280.1917 Ops/s $\color{#35bf28}+1.19\%$
test_redq_speed[False-None] 14.5550ms 12.6800ms 78.8643 Ops/s 75.0804 Ops/s $\textbf{\color{#35bf28}+5.04\%}$
test_redq_speed[False-backward] 0.2540s 26.5671ms 37.6406 Ops/s 44.4870 Ops/s $\textbf{\color{#d91a1a}-15.39\%}$
test_redq_speed[True-None] 5.1569ms 4.5301ms 220.7473 Ops/s 180.1096 Ops/s $\textbf{\color{#35bf28}+22.56\%}$
test_redq_speed[True-backward] 12.1726ms 11.8757ms 84.2057 Ops/s 73.3152 Ops/s $\textbf{\color{#35bf28}+14.85\%}$
test_redq_speed[reduce-overhead-None] 6.0200ms 4.5186ms 221.3079 Ops/s 180.0514 Ops/s $\textbf{\color{#35bf28}+22.91\%}$
test_redq_speed[reduce-overhead-backward] 13.5229ms 12.3173ms 81.1866 Ops/s 73.9283 Ops/s $\textbf{\color{#35bf28}+9.82\%}$
test_redq_deprec_speed[False-None] 14.9631ms 12.9492ms 77.2250 Ops/s 68.4820 Ops/s $\textbf{\color{#35bf28}+12.77\%}$
test_redq_deprec_speed[False-backward] 20.3716ms 18.4993ms 54.0562 Ops/s 48.2872 Ops/s $\textbf{\color{#35bf28}+11.95\%}$
test_redq_deprec_speed[True-None] 4.1458ms 3.5613ms 280.7955 Ops/s 218.3906 Ops/s $\textbf{\color{#35bf28}+28.57\%}$
test_redq_deprec_speed[True-backward] 8.1457ms 7.8868ms 126.7938 Ops/s 104.9750 Ops/s $\textbf{\color{#35bf28}+20.78\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.1053ms 3.5348ms 282.9002 Ops/s 231.0508 Ops/s $\textbf{\color{#35bf28}+22.44\%}$
test_redq_deprec_speed[reduce-overhead-backward] 9.6918ms 8.7101ms 114.8088 Ops/s 104.5097 Ops/s $\textbf{\color{#35bf28}+9.85\%}$
test_td3_speed[False-None] 8.0991ms 7.8420ms 127.5189 Ops/s 114.4244 Ops/s $\textbf{\color{#35bf28}+11.44\%}$
test_td3_speed[False-backward] 10.6317ms 10.2326ms 97.7265 Ops/s 85.2778 Ops/s $\textbf{\color{#35bf28}+14.60\%}$
test_td3_speed[True-None] 2.2904ms 1.7310ms 577.6940 Ops/s 556.4396 Ops/s $\color{#35bf28}+3.82\%$
test_td3_speed[True-backward] 3.5554ms 3.4102ms 293.2379 Ops/s 291.7326 Ops/s $\color{#35bf28}+0.52\%$
test_td3_speed[reduce-overhead-None] 1.9408ms 1.7158ms 582.8138 Ops/s 565.8968 Ops/s $\color{#35bf28}+2.99\%$
test_td3_speed[reduce-overhead-backward] 3.4514ms 3.3128ms 301.8608 Ops/s 294.6094 Ops/s $\color{#35bf28}+2.46\%$
test_cql_speed[False-None] 36.9889ms 35.6116ms 28.0807 Ops/s 27.5992 Ops/s $\color{#35bf28}+1.74\%$
test_cql_speed[False-backward] 47.9800ms 45.4469ms 22.0037 Ops/s 21.0364 Ops/s $\color{#35bf28}+4.60\%$
test_cql_speed[True-None] 16.6620ms 15.2951ms 65.3803 Ops/s 63.4658 Ops/s $\color{#35bf28}+3.02\%$
test_cql_speed[True-backward] 22.8652ms 22.0474ms 45.3567 Ops/s 43.7415 Ops/s $\color{#35bf28}+3.69\%$
test_cql_speed[reduce-overhead-None] 16.8361ms 15.6977ms 63.7037 Ops/s 64.4979 Ops/s $\color{#d91a1a}-1.23\%$
test_cql_speed[reduce-overhead-backward] 23.7387ms 22.4063ms 44.6303 Ops/s 45.2790 Ops/s $\color{#d91a1a}-1.43\%$
test_a2c_speed[False-None] 8.5046ms 7.3423ms 136.1970 Ops/s 137.1878 Ops/s $\color{#d91a1a}-0.72\%$
test_a2c_speed[False-backward] 15.8950ms 14.3304ms 69.7816 Ops/s 70.0639 Ops/s $\color{#d91a1a}-0.40\%$
test_a2c_speed[True-None] 4.5212ms 4.1691ms 239.8615 Ops/s 235.7202 Ops/s $\color{#35bf28}+1.76\%$
test_a2c_speed[True-backward] 11.6015ms 10.5737ms 94.5745 Ops/s 93.2076 Ops/s $\color{#35bf28}+1.47\%$
test_a2c_speed[reduce-overhead-None] 4.8110ms 4.1489ms 241.0262 Ops/s 220.6497 Ops/s $\textbf{\color{#35bf28}+9.23\%}$
test_a2c_speed[reduce-overhead-backward] 11.0141ms 10.5563ms 94.7299 Ops/s 93.4556 Ops/s $\color{#35bf28}+1.36\%$
test_ppo_speed[False-None] 8.9542ms 7.3999ms 135.1372 Ops/s 132.7214 Ops/s $\color{#35bf28}+1.82\%$
test_ppo_speed[False-backward] 14.7003ms 14.4128ms 69.3825 Ops/s 68.2715 Ops/s $\color{#35bf28}+1.63\%$
test_ppo_speed[True-None] 5.5366ms 3.7108ms 269.4858 Ops/s 269.0665 Ops/s $\color{#35bf28}+0.16\%$
test_ppo_speed[True-backward] 10.0062ms 9.5214ms 105.0262 Ops/s 103.7050 Ops/s $\color{#35bf28}+1.27\%$
test_ppo_speed[reduce-overhead-None] 4.7850ms 3.6635ms 272.9623 Ops/s 268.1598 Ops/s $\color{#35bf28}+1.79\%$
test_ppo_speed[reduce-overhead-backward] 10.5140ms 9.5561ms 104.6449 Ops/s 102.7933 Ops/s $\color{#35bf28}+1.80\%$
test_reinforce_speed[False-None] 7.5012ms 6.4247ms 155.6499 Ops/s 150.3118 Ops/s $\color{#35bf28}+3.55\%$
test_reinforce_speed[False-backward] 10.1225ms 9.8710ms 101.3073 Ops/s 100.6244 Ops/s $\color{#35bf28}+0.68\%$
test_reinforce_speed[True-None] 3.2755ms 2.6307ms 380.1229 Ops/s 373.1872 Ops/s $\color{#35bf28}+1.86\%$
test_reinforce_speed[True-backward] 8.9371ms 8.4725ms 118.0284 Ops/s 114.7045 Ops/s $\color{#35bf28}+2.90\%$
test_reinforce_speed[reduce-overhead-None] 3.5302ms 2.6497ms 377.3967 Ops/s 372.2094 Ops/s $\color{#35bf28}+1.39\%$
test_reinforce_speed[reduce-overhead-backward] 9.2492ms 8.5527ms 116.9221 Ops/s 113.3964 Ops/s $\color{#35bf28}+3.11\%$
test_iql_speed[False-None] 32.9992ms 31.8890ms 31.3588 Ops/s 30.4343 Ops/s $\color{#35bf28}+3.04\%$
test_iql_speed[False-backward] 47.1709ms 44.9195ms 22.2621 Ops/s 21.6930 Ops/s $\color{#35bf28}+2.62\%$
test_iql_speed[True-None] 11.6475ms 10.4194ms 95.9747 Ops/s 92.9565 Ops/s $\color{#35bf28}+3.25\%$
test_iql_speed[True-backward] 22.0935ms 21.1509ms 47.2793 Ops/s 45.6234 Ops/s $\color{#35bf28}+3.63\%$
test_iql_speed[reduce-overhead-None] 11.7824ms 10.5456ms 94.8263 Ops/s 93.2082 Ops/s $\color{#35bf28}+1.74\%$
test_iql_speed[reduce-overhead-backward] 22.9121ms 21.2934ms 46.9628 Ops/s 45.4220 Ops/s $\color{#35bf28}+3.39\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.9022ms 4.9467ms 202.1562 Ops/s 191.9962 Ops/s $\textbf{\color{#35bf28}+5.29\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9341ms 0.5478ms 1.8255 KOps/s 1.8878 KOps/s $\color{#d91a1a}-3.30\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8166ms 0.4861ms 2.0570 KOps/s 2.0033 KOps/s $\color{#35bf28}+2.68\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.3135ms 4.5811ms 218.2894 Ops/s 202.7983 Ops/s $\textbf{\color{#35bf28}+7.64\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.2745ms 0.4986ms 2.0058 KOps/s 1.9600 KOps/s $\color{#35bf28}+2.34\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 1.4568ms 0.4738ms 2.1106 KOps/s 2.0334 KOps/s $\color{#35bf28}+3.79\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.5359ms 1.6258ms 615.0735 Ops/s 597.9562 Ops/s $\color{#35bf28}+2.86\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.2216ms 1.5747ms 635.0219 Ops/s 622.2373 Ops/s $\color{#35bf28}+2.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.2773ms 4.7728ms 209.5202 Ops/s 197.9227 Ops/s $\textbf{\color{#35bf28}+5.86\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2971ms 0.6425ms 1.5564 KOps/s 1.5128 KOps/s $\color{#35bf28}+2.88\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9878ms 0.6187ms 1.6162 KOps/s 1.5831 KOps/s $\color{#35bf28}+2.09\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3681ms 4.7747ms 209.4385 Ops/s 204.5055 Ops/s $\color{#35bf28}+2.41\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0697ms 0.5182ms 1.9299 KOps/s 1.8965 KOps/s $\color{#35bf28}+1.76\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7762ms 0.4932ms 2.0276 KOps/s 2.0117 KOps/s $\color{#35bf28}+0.79\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.1963ms 4.6582ms 214.6767 Ops/s 206.3016 Ops/s $\color{#35bf28}+4.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9430ms 0.5012ms 1.9953 KOps/s 1.9216 KOps/s $\color{#35bf28}+3.84\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6997ms 0.4759ms 2.1011 KOps/s 2.0711 KOps/s $\color{#35bf28}+1.45\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.2570ms 4.8725ms 205.2319 Ops/s 202.3572 Ops/s $\color{#35bf28}+1.42\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1013ms 0.6617ms 1.5112 KOps/s 1.4562 KOps/s $\color{#35bf28}+3.78\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8494ms 0.6296ms 1.5882 KOps/s 1.5812 KOps/s $\color{#35bf28}+0.44\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.4524s 13.1888ms 75.8219 Ops/s 247.6896 Ops/s $\textbf{\color{#d91a1a}-69.39\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.1320ms 2.4531ms 407.6460 Ops/s 460.1850 Ops/s $\textbf{\color{#d91a1a}-11.42\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.9700ms 1.3034ms 767.2075 Ops/s 820.5930 Ops/s $\textbf{\color{#d91a1a}-6.51\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.1421ms 4.2872ms 233.2541 Ops/s 36.2678 Ops/s $\textbf{\color{#35bf28}+543.14\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.0311ms 2.5565ms 391.1618 Ops/s 429.9899 Ops/s $\textbf{\color{#d91a1a}-9.03\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.0092ms 1.4118ms 708.3213 Ops/s 798.7752 Ops/s $\textbf{\color{#d91a1a}-11.32\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.4211s 12.7723ms 78.2943 Ops/s 220.3572 Ops/s $\textbf{\color{#d91a1a}-64.47\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.9216ms 2.3960ms 417.3618 Ops/s 407.0447 Ops/s $\color{#35bf28}+2.53\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 6.5962ms 1.5074ms 663.3990 Ops/s 568.9725 Ops/s $\textbf{\color{#35bf28}+16.60\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.3984ms 11.2152ms 89.1645 Ops/s 81.9151 Ops/s $\textbf{\color{#35bf28}+8.85\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 18.8111ms 15.0064ms 66.6381 Ops/s 64.2900 Ops/s $\color{#35bf28}+3.65\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 20.0884ms 19.7050ms 50.7485 Ops/s 47.8059 Ops/s $\textbf{\color{#35bf28}+6.16\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 17.0284ms 14.9808ms 66.7523 Ops/s 63.5286 Ops/s $\textbf{\color{#35bf28}+5.07\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 21.0115ms 19.4503ms 51.4130 Ops/s 48.1425 Ops/s $\textbf{\color{#35bf28}+6.79\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 17.4150ms 16.1648ms 61.8626 Ops/s 57.9987 Ops/s $\textbf{\color{#35bf28}+6.66\%}$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.7507s 0.7500s 1.3333 Ops/s 1.3319 Ops/s $\color{#35bf28}+0.10\%$
test_transformed 1.0097s 1.0089s 0.9911 Ops/s 0.9936 Ops/s $\color{#d91a1a}-0.24\%$
test_serial 2.1571s 2.1537s 0.4643 Ops/s 0.4626 Ops/s $\color{#35bf28}+0.37\%$
test_parallel 2.0402s 2.0134s 0.4967 Ops/s 0.5058 Ops/s $\color{#d91a1a}-1.81\%$
test_step_mdp_speed[True-True-True-True-True] 0.1890ms 40.2110μs 24.8688 KOps/s 25.7845 KOps/s $\color{#d91a1a}-3.55\%$
test_step_mdp_speed[True-True-True-True-False] 65.7010μs 23.5572μs 42.4498 KOps/s 43.7591 KOps/s $\color{#d91a1a}-2.99\%$
test_step_mdp_speed[True-True-True-False-True] 52.1210μs 22.3032μs 44.8366 KOps/s 46.3320 KOps/s $\color{#d91a1a}-3.23\%$
test_step_mdp_speed[True-True-True-False-False] 39.2100μs 13.0482μs 76.6389 KOps/s 79.6321 KOps/s $\color{#d91a1a}-3.76\%$
test_step_mdp_speed[True-True-False-True-True] 73.5710μs 43.0580μs 23.2245 KOps/s 23.9420 KOps/s $\color{#d91a1a}-3.00\%$
test_step_mdp_speed[True-True-False-True-False] 57.4610μs 25.2012μs 39.6807 KOps/s 40.2876 KOps/s $\color{#d91a1a}-1.51\%$
test_step_mdp_speed[True-True-False-False-True] 0.1621ms 24.5553μs 40.7243 KOps/s 40.9932 KOps/s $\color{#d91a1a}-0.66\%$
test_step_mdp_speed[True-True-False-False-False] 38.8300μs 15.2515μs 65.5672 KOps/s 68.3829 KOps/s $\color{#d91a1a}-4.12\%$
test_step_mdp_speed[True-False-True-True-True] 79.1810μs 44.9686μs 22.2377 KOps/s 22.6294 KOps/s $\color{#d91a1a}-1.73\%$
test_step_mdp_speed[True-False-True-True-False] 58.8110μs 27.6450μs 36.1730 KOps/s 36.6962 KOps/s $\color{#d91a1a}-1.43\%$
test_step_mdp_speed[True-False-True-False-True] 53.4610μs 24.5828μs 40.6789 KOps/s 41.2527 KOps/s $\color{#d91a1a}-1.39\%$
test_step_mdp_speed[True-False-True-False-False] 48.0610μs 15.1266μs 66.1088 KOps/s 66.8220 KOps/s $\color{#d91a1a}-1.07\%$
test_step_mdp_speed[True-False-False-True-True] 79.7110μs 46.9524μs 21.2982 KOps/s 21.5760 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[True-False-False-True-False] 64.4210μs 29.5596μs 33.8300 KOps/s 34.1571 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[True-False-False-False-True] 57.2010μs 26.5692μs 37.6376 KOps/s 38.5685 KOps/s $\color{#d91a1a}-2.41\%$
test_step_mdp_speed[True-False-False-False-False] 55.9610μs 17.6129μs 56.7767 KOps/s 58.8236 KOps/s $\color{#d91a1a}-3.48\%$
test_step_mdp_speed[False-True-True-True-True] 80.0410μs 45.8589μs 21.8060 KOps/s 22.7830 KOps/s $\color{#d91a1a}-4.29\%$
test_step_mdp_speed[False-True-True-True-False] 58.1800μs 27.3533μs 36.5586 KOps/s 36.8816 KOps/s $\color{#d91a1a}-0.88\%$
test_step_mdp_speed[False-True-True-False-True] 62.4810μs 28.4582μs 35.1392 KOps/s 34.9638 KOps/s $\color{#35bf28}+0.50\%$
test_step_mdp_speed[False-True-True-False-False] 48.4610μs 16.6822μs 59.9440 KOps/s 60.5228 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[False-True-False-True-True] 74.5210μs 46.9924μs 21.2800 KOps/s 21.3268 KOps/s $\color{#d91a1a}-0.22\%$
test_step_mdp_speed[False-True-False-True-False] 59.8310μs 29.3199μs 34.1065 KOps/s 33.7691 KOps/s $\color{#35bf28}+1.00\%$
test_step_mdp_speed[False-True-False-False-True] 3.0969ms 31.0719μs 32.1834 KOps/s 34.4702 KOps/s $\textbf{\color{#d91a1a}-6.63\%}$
test_step_mdp_speed[False-True-False-False-False] 51.3710μs 19.1861μs 52.1211 KOps/s 53.8524 KOps/s $\color{#d91a1a}-3.21\%$
test_step_mdp_speed[False-False-True-True-True] 89.5610μs 50.0426μs 19.9830 KOps/s 20.3537 KOps/s $\color{#d91a1a}-1.82\%$
test_step_mdp_speed[False-False-True-True-False] 0.2173ms 32.2843μs 30.9748 KOps/s 31.5978 KOps/s $\color{#d91a1a}-1.97\%$
test_step_mdp_speed[False-False-True-False-True] 60.8710μs 30.8852μs 32.3780 KOps/s 32.8964 KOps/s $\color{#d91a1a}-1.58\%$
test_step_mdp_speed[False-False-True-False-False] 60.3210μs 18.9460μs 52.7815 KOps/s 53.9397 KOps/s $\color{#d91a1a}-2.15\%$
test_step_mdp_speed[False-False-False-True-True] 77.1710μs 51.0476μs 19.5896 KOps/s 20.2257 KOps/s $\color{#d91a1a}-3.15\%$
test_step_mdp_speed[False-False-False-True-False] 60.9110μs 33.8383μs 29.5523 KOps/s 30.0191 KOps/s $\color{#d91a1a}-1.55\%$
test_step_mdp_speed[False-False-False-False-True] 65.0010μs 32.1811μs 31.0742 KOps/s 31.6695 KOps/s $\color{#d91a1a}-1.88\%$
test_step_mdp_speed[False-False-False-False-False] 51.9810μs 20.7634μs 48.1616 KOps/s 48.8102 KOps/s $\color{#d91a1a}-1.33\%$
test_values[generalized_advantage_estimate-True-True] 25.6619ms 25.1136ms 39.8191 Ops/s 39.3522 Ops/s $\color{#35bf28}+1.19\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1089s 3.0802ms 324.6528 Ops/s 324.9034 Ops/s $\color{#d91a1a}-0.08\%$
test_values[td0_return_estimate-False-False] 0.1070ms 80.1767μs 12.4725 KOps/s 12.2124 KOps/s $\color{#35bf28}+2.13\%$
test_values[td1_return_estimate-False-False] 55.9762ms 55.3860ms 18.0551 Ops/s 17.7262 Ops/s $\color{#35bf28}+1.86\%$
test_values[vec_td1_return_estimate-False-False] 1.2969ms 1.0894ms 917.9693 Ops/s 910.0645 Ops/s $\color{#35bf28}+0.87\%$
test_values[td_lambda_return_estimate-True-False] 96.1734ms 89.1451ms 11.2177 Ops/s 10.7852 Ops/s $\color{#35bf28}+4.01\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2846ms 1.0861ms 920.7273 Ops/s 911.1058 Ops/s $\color{#35bf28}+1.06\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 26.6539ms 25.9162ms 38.5858 Ops/s 39.6827 Ops/s $\color{#d91a1a}-2.76\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0562ms 0.7645ms 1.3081 KOps/s 1.3019 KOps/s $\color{#35bf28}+0.47\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8160ms 0.6924ms 1.4442 KOps/s 1.4715 KOps/s $\color{#d91a1a}-1.85\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.6752ms 1.4985ms 667.3124 Ops/s 670.5569 Ops/s $\color{#d91a1a}-0.48\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8699ms 0.6902ms 1.4489 KOps/s 1.4440 KOps/s $\color{#35bf28}+0.34\%$
test_dqn_speed[False-None] 6.9213ms 1.5386ms 649.9204 Ops/s 656.8316 Ops/s $\color{#d91a1a}-1.05\%$
test_dqn_speed[False-backward] 2.2600ms 2.1534ms 464.3849 Ops/s 464.5820 Ops/s $\color{#d91a1a}-0.04\%$
test_dqn_speed[True-None] 0.6832ms 0.5314ms 1.8818 KOps/s 1.8392 KOps/s $\color{#35bf28}+2.32\%$
test_dqn_speed[True-backward] 1.1815ms 1.1127ms 898.6948 Ops/s 814.3151 Ops/s $\textbf{\color{#35bf28}+10.36\%}$
test_dqn_speed[reduce-overhead-None] 0.7193ms 0.5438ms 1.8389 KOps/s 1.7380 KOps/s $\textbf{\color{#35bf28}+5.80\%}$
test_dqn_speed[reduce-overhead-backward] 1.0391ms 0.9812ms 1.0192 KOps/s 920.0066 Ops/s $\textbf{\color{#35bf28}+10.78\%}$
test_ddpg_speed[False-None] 3.2144ms 2.9384ms 340.3258 Ops/s 343.4843 Ops/s $\color{#d91a1a}-0.92\%$
test_ddpg_speed[False-backward] 4.5773ms 4.1752ms 239.5083 Ops/s 234.1502 Ops/s $\color{#35bf28}+2.29\%$
test_ddpg_speed[True-None] 1.3044ms 1.0999ms 909.1950 Ops/s 916.4200 Ops/s $\color{#d91a1a}-0.79\%$
test_ddpg_speed[True-backward] 2.1931ms 2.1407ms 467.1291 Ops/s 426.0324 Ops/s $\textbf{\color{#35bf28}+9.65\%}$
test_ddpg_speed[reduce-overhead-None] 1.2162ms 1.0793ms 926.5539 Ops/s 908.8382 Ops/s $\color{#35bf28}+1.95\%$
test_ddpg_speed[reduce-overhead-backward] 1.7714ms 1.6376ms 610.6453 Ops/s 553.3634 Ops/s $\textbf{\color{#35bf28}+10.35\%}$
test_sac_speed[False-None] 8.6206ms 8.1647ms 122.4785 Ops/s 121.6744 Ops/s $\color{#35bf28}+0.66\%$
test_sac_speed[False-backward] 11.6883ms 11.1945ms 89.3294 Ops/s 87.2789 Ops/s $\color{#35bf28}+2.35\%$
test_sac_speed[True-None] 1.6644ms 1.5187ms 658.4377 Ops/s 616.4489 Ops/s $\textbf{\color{#35bf28}+6.81\%}$
test_sac_speed[True-backward] 3.4398ms 3.4032ms 293.8427 Ops/s 288.9412 Ops/s $\color{#35bf28}+1.70\%$
test_sac_speed[reduce-overhead-None] 23.2389ms 12.5862ms 79.4521 Ops/s 80.4674 Ops/s $\color{#d91a1a}-1.26\%$
test_sac_speed[reduce-overhead-backward] 1.6545ms 1.5158ms 659.7216 Ops/s 737.0900 Ops/s $\textbf{\color{#d91a1a}-10.50\%}$
test_redq_speed[False-None] 8.2706ms 7.5744ms 132.0236 Ops/s 130.8335 Ops/s $\color{#35bf28}+0.91\%$
test_redq_speed[False-backward] 12.5964ms 11.7776ms 84.9068 Ops/s 86.5294 Ops/s $\color{#d91a1a}-1.88\%$
test_redq_speed[True-None] 2.1525ms 2.0027ms 499.3335 Ops/s 493.5691 Ops/s $\color{#35bf28}+1.17\%$
test_redq_speed[True-backward] 4.0023ms 3.8355ms 260.7213 Ops/s 252.2628 Ops/s $\color{#35bf28}+3.35\%$
test_redq_speed[reduce-overhead-None] 2.2624ms 2.0182ms 495.4974 Ops/s 492.7502 Ops/s $\color{#35bf28}+0.56\%$
test_redq_speed[reduce-overhead-backward] 3.9365ms 3.7199ms 268.8252 Ops/s 262.6506 Ops/s $\color{#35bf28}+2.35\%$
test_redq_deprec_speed[False-None] 9.4589ms 9.1992ms 108.7046 Ops/s 108.5039 Ops/s $\color{#35bf28}+0.18\%$
test_redq_deprec_speed[False-backward] 13.2493ms 12.3021ms 81.2869 Ops/s 81.2204 Ops/s $\color{#35bf28}+0.08\%$
test_redq_deprec_speed[True-None] 2.5627ms 2.3492ms 425.6768 Ops/s 421.2280 Ops/s $\color{#35bf28}+1.06\%$
test_redq_deprec_speed[True-backward] 4.3314ms 4.0290ms 248.1998 Ops/s 243.9736 Ops/s $\color{#35bf28}+1.73\%$
test_redq_deprec_speed[reduce-overhead-None] 2.4187ms 2.3403ms 427.2883 Ops/s 424.1483 Ops/s $\color{#35bf28}+0.74\%$
test_redq_deprec_speed[reduce-overhead-backward] 4.1778ms 4.0386ms 247.6116 Ops/s 235.4841 Ops/s $\textbf{\color{#35bf28}+5.15\%}$
test_td3_speed[False-None] 8.2899ms 8.0664ms 123.9712 Ops/s 124.3186 Ops/s $\color{#d91a1a}-0.28\%$
test_td3_speed[False-backward] 10.8578ms 10.4781ms 95.4373 Ops/s 48.7497 Ops/s $\textbf{\color{#35bf28}+95.77\%}$
test_td3_speed[True-None] 1.6405ms 1.6127ms 620.0797 Ops/s 635.2118 Ops/s $\color{#d91a1a}-2.38\%$
test_td3_speed[True-backward] 3.3436ms 3.1357ms 318.9126 Ops/s 299.4098 Ops/s $\textbf{\color{#35bf28}+6.51\%}$
test_td3_speed[reduce-overhead-None] 50.5344ms 25.9471ms 38.5399 Ops/s 36.7218 Ops/s $\color{#35bf28}+4.95\%$
test_td3_speed[reduce-overhead-backward] 1.3463ms 1.2868ms 777.1239 Ops/s 680.9410 Ops/s $\textbf{\color{#35bf28}+14.13\%}$
test_cql_speed[False-None] 17.5648ms 16.9902ms 58.8576 Ops/s 58.6108 Ops/s $\color{#35bf28}+0.42\%$
test_cql_speed[False-backward] 23.0461ms 22.3428ms 44.7572 Ops/s 44.2797 Ops/s $\color{#35bf28}+1.08\%$
test_cql_speed[True-None] 3.1063ms 2.9455ms 339.4980 Ops/s 322.5759 Ops/s $\textbf{\color{#35bf28}+5.25\%}$
test_cql_speed[True-backward] 5.3454ms 5.0929ms 196.3505 Ops/s 192.7904 Ops/s $\color{#35bf28}+1.85\%$
test_cql_speed[reduce-overhead-None] 21.7338ms 13.2396ms 75.5307 Ops/s 75.8972 Ops/s $\color{#d91a1a}-0.48\%$
test_cql_speed[reduce-overhead-backward] 1.5971ms 1.5163ms 659.4977 Ops/s 616.2501 Ops/s $\textbf{\color{#35bf28}+7.02\%}$
test_a2c_speed[False-None] 3.4573ms 3.2752ms 305.3204 Ops/s 305.7892 Ops/s $\color{#d91a1a}-0.15\%$
test_a2c_speed[False-backward] 6.6721ms 6.2027ms 161.2201 Ops/s 151.6746 Ops/s $\textbf{\color{#35bf28}+6.29\%}$
test_a2c_speed[True-None] 1.1784ms 1.0011ms 998.8951 Ops/s 968.0990 Ops/s $\color{#35bf28}+3.18\%$
test_a2c_speed[True-backward] 2.7456ms 2.5952ms 385.3326 Ops/s 377.3667 Ops/s $\color{#35bf28}+2.11\%$
test_a2c_speed[reduce-overhead-None] 21.7472ms 11.7743ms 84.9308 Ops/s 86.0000 Ops/s $\color{#d91a1a}-1.24\%$
test_a2c_speed[reduce-overhead-backward] 1.3125ms 1.0390ms 962.5059 Ops/s 1.0074 KOps/s $\color{#d91a1a}-4.46\%$
test_ppo_speed[False-None] 4.0456ms 3.7434ms 267.1373 Ops/s 267.0450 Ops/s $\color{#35bf28}+0.03\%$
test_ppo_speed[False-backward] 7.5538ms 7.1999ms 138.8904 Ops/s 143.9637 Ops/s $\color{#d91a1a}-3.52\%$
test_ppo_speed[True-None] 1.0186ms 0.9521ms 1.0503 KOps/s 1.0392 KOps/s $\color{#35bf28}+1.08\%$
test_ppo_speed[True-backward] 2.7911ms 2.6939ms 371.2045 Ops/s 365.6794 Ops/s $\color{#35bf28}+1.51\%$
test_ppo_speed[reduce-overhead-None] 0.5839ms 0.5070ms 1.9726 KOps/s 1.9025 KOps/s $\color{#35bf28}+3.68\%$
test_ppo_speed[reduce-overhead-backward] 1.1676ms 1.1168ms 895.4000 Ops/s 877.4719 Ops/s $\color{#35bf28}+2.04\%$
test_reinforce_speed[False-None] 2.5309ms 2.3045ms 433.9338 Ops/s 433.5562 Ops/s $\color{#35bf28}+0.09\%$
test_reinforce_speed[False-backward] 3.8712ms 3.4789ms 287.4436 Ops/s 287.7889 Ops/s $\color{#d91a1a}-0.12\%$
test_reinforce_speed[True-None] 0.9676ms 0.8229ms 1.2153 KOps/s 1.1650 KOps/s $\color{#35bf28}+4.31\%$
test_reinforce_speed[True-backward] 2.5783ms 2.5365ms 394.2476 Ops/s 382.8797 Ops/s $\color{#35bf28}+2.97\%$
test_reinforce_speed[reduce-overhead-None] 22.1979ms 11.7635ms 85.0085 Ops/s 88.2987 Ops/s $\color{#d91a1a}-3.73\%$
test_reinforce_speed[reduce-overhead-backward] 1.2023ms 1.1724ms 852.9316 Ops/s 827.5602 Ops/s $\color{#35bf28}+3.07\%$
test_iql_speed[False-None] 9.9444ms 9.4106ms 106.2630 Ops/s 106.4622 Ops/s $\color{#d91a1a}-0.19\%$
test_iql_speed[False-backward] 14.0597ms 13.4752ms 74.2103 Ops/s 74.6390 Ops/s $\color{#d91a1a}-0.57\%$
test_iql_speed[True-None] 1.9480ms 1.7467ms 572.5097 Ops/s 570.9485 Ops/s $\color{#35bf28}+0.27\%$
test_iql_speed[True-backward] 4.3608ms 4.2305ms 236.3809 Ops/s 231.4528 Ops/s $\color{#35bf28}+2.13\%$
test_iql_speed[reduce-overhead-None] 15.7442ms 9.2168ms 108.4979 Ops/s 86.9129 Ops/s $\textbf{\color{#35bf28}+24.84\%}$
test_iql_speed[reduce-overhead-backward] 1.7434ms 1.6044ms 623.2860 Ops/s 622.3396 Ops/s $\color{#35bf28}+0.15\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.9854ms 6.4277ms 155.5759 Ops/s 154.1465 Ops/s $\color{#35bf28}+0.93\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5713ms 0.3840ms 2.6039 KOps/s 2.8148 KOps/s $\textbf{\color{#d91a1a}-7.49\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5958ms 0.3647ms 2.7419 KOps/s 2.9759 KOps/s $\textbf{\color{#d91a1a}-7.86\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.5505ms 6.1832ms 161.7288 Ops/s 161.1194 Ops/s $\color{#35bf28}+0.38\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0363ms 0.3025ms 3.3061 KOps/s 3.6713 KOps/s $\textbf{\color{#d91a1a}-9.95\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5643ms 0.3133ms 3.1920 KOps/s 4.1165 KOps/s $\textbf{\color{#d91a1a}-22.46\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4701ms 1.2709ms 786.8480 Ops/s 688.1761 Ops/s $\textbf{\color{#35bf28}+14.34\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4137ms 1.2207ms 819.2006 Ops/s 686.7379 Ops/s $\textbf{\color{#35bf28}+19.29\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.5295ms 6.3619ms 157.1852 Ops/s 156.1542 Ops/s $\color{#35bf28}+0.66\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2073ms 0.4757ms 2.1023 KOps/s 1.9502 KOps/s $\textbf{\color{#35bf28}+7.80\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7830ms 0.4455ms 2.2448 KOps/s 2.4905 KOps/s $\textbf{\color{#d91a1a}-9.86\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3215ms 6.1769ms 161.8931 Ops/s 159.7578 Ops/s $\color{#35bf28}+1.34\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.8199ms 0.3859ms 2.5911 KOps/s 3.0545 KOps/s $\textbf{\color{#d91a1a}-15.17\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6499ms 0.3049ms 3.2798 KOps/s 3.1936 KOps/s $\color{#35bf28}+2.70\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4986ms 6.1612ms 162.3070 Ops/s 160.4982 Ops/s $\color{#35bf28}+1.13\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6130ms 0.2633ms 3.7975 KOps/s 2.7195 KOps/s $\textbf{\color{#35bf28}+39.64\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5568ms 0.2984ms 3.3507 KOps/s 2.7166 KOps/s $\textbf{\color{#35bf28}+23.34\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4756ms 6.3128ms 158.4095 Ops/s 155.9382 Ops/s $\color{#35bf28}+1.58\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0924ms 0.4924ms 2.0308 KOps/s 2.0998 KOps/s $\color{#d91a1a}-3.29\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7346ms 0.4774ms 2.0946 KOps/s 2.3448 KOps/s $\textbf{\color{#d91a1a}-10.67\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.8673ms 5.2655ms 189.9150 Ops/s 187.4922 Ops/s $\color{#35bf28}+1.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.8403ms 1.8948ms 527.7515 Ops/s 521.2058 Ops/s $\color{#35bf28}+1.26\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 8.2566ms 1.2808ms 780.7404 Ops/s 846.1949 Ops/s $\textbf{\color{#d91a1a}-7.74\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.3359ms 5.3599ms 186.5702 Ops/s 189.7711 Ops/s $\color{#d91a1a}-1.69\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.5026ms 2.0441ms 489.2029 Ops/s 436.8810 Ops/s $\textbf{\color{#35bf28}+11.98\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.1734ms 1.2548ms 796.9609 Ops/s 844.4025 Ops/s $\textbf{\color{#d91a1a}-5.62\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5041s 15.5267ms 64.4052 Ops/s 32.5989 Ops/s $\textbf{\color{#35bf28}+97.57\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 11.8512ms 2.1137ms 473.1010 Ops/s 497.5589 Ops/s $\color{#d91a1a}-4.92\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.1550ms 1.3790ms 725.1677 Ops/s 808.6241 Ops/s $\textbf{\color{#d91a1a}-10.32\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.7466ms 13.1219ms 76.2085 Ops/s 74.8586 Ops/s $\color{#35bf28}+1.80\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.5839ms 17.1527ms 58.2997 Ops/s 56.1484 Ops/s $\color{#35bf28}+3.83\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 18.3614ms 17.6862ms 56.5412 Ops/s 54.2326 Ops/s $\color{#35bf28}+4.26\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.7112ms 17.5612ms 56.9436 Ops/s 54.3455 Ops/s $\color{#35bf28}+4.78\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 18.1605ms 17.4846ms 57.1931 Ops/s 55.2173 Ops/s $\color{#35bf28}+3.58\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.4293ms 18.8014ms 53.1875 Ops/s 51.8057 Ops/s $\color{#35bf28}+2.67\%$

[ghstack-poisoned]
[ghstack-poisoned]
@vmoens vmoens merged commit fe4f5c7 into gh/vmoens/36/base Dec 13, 2024
61 of 75 checks passed
vmoens added a commit that referenced this pull request Dec 13, 2024
ghstack-source-id: d362d6c17faa0eb609009bce004bb4766e345d5e
Pull Request resolved: #2553
@vmoens vmoens deleted the gh/vmoens/36/head branch December 13, 2024 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants