[RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. #47929

simonsays1980 · 2024-10-08T11:55:30Z

Why are these changes needed?

GPU and multi-GPU training was not working so far b/c of serialization errors driven by device mappings during the creation of map_batches workers. This PR solves these errors and brings several modifications/simplifications to OfflineData, OfflinePreLearner, and Learner.
Furthermore, it adds single-GPU learning tests for offline algorithms.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

… a couple more simplifications. Tuned offline examples to the new settings. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…rner'. Removed 'NumpyToTensor' connector from learner connector prior to 'OfflinePreLearner'. Signed-off-by: simonsays1980 <[email protected]>

…er connector. Signed-off-by: simonsays1980 <[email protected]>

sven1977 · 2024-10-08T17:44:45Z

rllib/BUILD

@@ -341,6 +341,19 @@ py_test(
    args = ["--as-test", "--enable-new-api-stack"]
 )

+py_test(


Awesooomeee!!! This is so cool!

sven1977 · 2024-10-08T17:44:51Z

rllib/BUILD

@@ -356,6 +369,19 @@ py_test(
    args = ["--as-test", "--enable-new-api-stack"]
 )

+py_test(


sven1977 · 2024-10-08T17:44:57Z

rllib/BUILD

@@ -564,6 +590,19 @@ py_test(
    args = ["--as-test", "--enable-new-api-stack"]
 )

+py_test(


same same :D

sven1977 · 2024-10-08T17:45:42Z

rllib/algorithms/bc/bc.py

@@ -98,6 +98,7 @@ def build_learner_connector(
        # Remove unneeded connectors from the MARWIL connector pipeline.
        pipeline.remove("AddOneTsToEpisodesAndTruncate")
        pipeline.remove("GeneralAdvantageEstimation")
+        pipeline.remove("NumpyToTensor")


Aaahh! So this was one of the problems? That we were already converting everything to torch tensors?

Yes it was one of them. The major one, was however, that we were passing in a learner that runs on GPU and that one needed to be serialized from ray to send it to the data workers. When deserializing it there it errored out.

rllib/algorithms/marwil/marwil.py

sven1977 · 2024-10-08T18:33:16Z

rllib/core/learner/learner.py

-        if self._learner_connector is None:
+        # Note, if we have a learner connector, but a `MultiAgentBatch` is passed in,
+        # we are in an offline setting.
+        # TODO (simon, sven): Check, if DreamerV3 has the same setting.


We'll see what tests doing :) DreamerV3 does not use connectors. It passes the batch from the replay buffer directly into update_from_batch.

sven1977 · 2024-10-08T18:34:03Z

rllib/offline/offline_prelearner.py

-            self._module = module_spec.build()
-            self._module.set_state(module_state)
+
+        # Build the module from spec. Note, this will be a MultiRLModule.


Nice simplification!

sven1977 · 2024-10-08T18:37:05Z

rllib/algorithms/marwil/marwil.py

@@ -361,6 +362,9 @@ def build_learner_connector(
        pipeline.append(
            GeneralAdvantageEstimation(gamma=self.gamma, lambda_=self.lambda_)
        )
+        pipeline.append(


Makes sense. GAE connector outputs tensors already due to it requiring a VF forward pass (with the tensors coming from NumpyToTensor). It does get a little more complicated now in the pipeline, but I feel like it's still ok (not too crazy, connector pieces are named properly, each piece performs a well distinguished task, ...).

Yes. And keep in mind: The data workers will run in parallel and prefetch batches which will actually make the pipeline quite smooth. Another connector piece or one less will not make a big difference, if at all. User usually have enough resources to run multiple data workers in parallel and they should.

sven1977

Awesome PR. Love it! Thanks for these important fixes @simonsays1980 , unlocking GPU training for offline RL on the new API stack. This is huge!

Signed-off-by: Sven Mika <[email protected]>

…o resources not avilable. Signed-off-by: simonsays1980 <[email protected]>

…0/ray into offline-enable-gpu-training

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 added 2 commits October 8, 2024 12:53

Enabled GPU and multi-GPU training for offline algorithms and induced…

fa6f6f8

… a couple more simplifications. Tuned offline examples to the new settings. Signed-off-by: simonsays1980 <[email protected]>

Added gpu tests for offline algorithms.

a7c2335

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 added enhancement Request for new feature and/or capability rllib RLlib related issues rllib-gpu-multi-gpu RLlib issues that's related to running on one or multiple GPUs rllib-offline-rl Offline RL problems labels Oct 8, 2024

Some minor changes in docstrings.

bd322ba

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 marked this pull request as ready for review October 8, 2024 12:02

simonsays1980 requested a review from sven1977 as a code owner October 8, 2024 12:02

simonsays1980 added 2 commits October 8, 2024 17:52

Fixed some minor bugs in setup of 'OfflineLearner' and 'OfflinePreLea…

c612eea

…rner'. Removed 'NumpyToTensor' connector from learner connector prior to 'OfflinePreLearner'. Signed-off-by: simonsays1980 <[email protected]>

Fixed a couple of small nits in 'test_marwil' due to changes in learn…

a4a0f77

…er connector. Signed-off-by: simonsays1980 <[email protected]>

sven1977 changed the title ~~[RLlib; Offline RL] - Enable GPU and multi-GPU training for offline algorithms.~~ [RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. Oct 8, 2024

sven1977 added the rllib-newstack label Oct 8, 2024

sven1977 reviewed Oct 8, 2024

View reviewed changes

rllib/algorithms/marwil/marwil.py Outdated Show resolved Hide resolved

sven1977 reviewed Oct 8, 2024

View reviewed changes

sven1977 approved these changes Oct 8, 2024

View reviewed changes

Apply suggestions from code review

21039a8

Signed-off-by: Sven Mika <[email protected]>

sven1977 enabled auto-merge (squash) October 8, 2024 18:38

github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 8, 2024

simonsays1980 added 3 commits October 9, 2024 10:58

Merge branch 'master' into offline-enable-gpu-training

f320c66

Added extra CPU to offline GPU tests because tests were failing due t…

5dcd344

…o resources not avilable. Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'offline-enable-gpu-training' of github.com:simonsays198…

ec4d65c

…0/ray into offline-enable-gpu-training

github-actions bot disabled auto-merge October 9, 2024 09:04

Fixed 'OfflinePreLearner' tests.

bba1346

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 added 3 commits October 10, 2024 11:14

Added extra logging information for 'OfflineData'.

ed6c7c1

Signed-off-by: simonsays1980 <[email protected]>

Merged Master

ed170bf

Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into offline-enable-gpu-training

348589b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. #47929

[RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. #47929

simonsays1980 commented Oct 8, 2024 •

edited

Loading

sven1977 Oct 8, 2024

sven1977 Oct 8, 2024

sven1977 Oct 8, 2024

sven1977 Oct 8, 2024

simonsays1980 Oct 8, 2024

sven1977 Oct 8, 2024

sven1977 Oct 8, 2024

sven1977 Oct 8, 2024

simonsays1980 Oct 8, 2024

sven1977 left a comment

[RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. #47929

Are you sure you want to change the base?

[RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. #47929

Conversation

simonsays1980 commented Oct 8, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

simonsays1980 commented Oct 8, 2024 •

edited

Loading