Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Framework #327

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

threewisemonkeys-as
Copy link
Member

This is a very rough draft of a trainer for distributed off policy agents.

Currently working on getting DDPG to be trained in distributed manner using this.

@codecov
Copy link

codecov bot commented Sep 3, 2020

Codecov Report

Merging #327 into master will decrease coverage by 4.04%.
The diff coverage is 7.07%.

@@            Coverage Diff             @@
##           master     #327      +/-   ##
==========================================
- Coverage   90.87%   86.83%   -4.05%     
==========================================
  Files          88       97       +9     
  Lines        3705     4017     +312     
==========================================
+ Hits         3367     3488     +121     
- Misses        338      529     +191     
Impacted Files Coverage Δ
genrl/distributed/__init__.py 0.00% <0.00%> (ø)
genrl/distributed/actor.py 0.00% <0.00%> (ø)
genrl/distributed/core.py 0.00% <0.00%> (ø)
genrl/distributed/experience_server.py 0.00% <0.00%> (ø)
genrl/distributed/learner.py 0.00% <0.00%> (ø)
genrl/distributed/parameter_server.py 0.00% <0.00%> (ø)
genrl/trainers/distributed.py 29.62% <29.62%> (ø)
genrl/core/buffers.py 93.84% <33.33%> (-1.40%) ⬇️
genrl/agents/deep/base/offpolicy.py 96.20% <66.66%> (-1.21%) ⬇️
genrl/agents/deep/ddpg/ddpg.py 86.20% <100.00%> (-6.11%) ⬇️
... and 33 more

@lgtm-com
Copy link

lgtm-com bot commented Sep 3, 2020

This pull request introduces 3 alerts when merging 4cba727 into 0fe4180 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

Copy link
Member

@Sharad24 Sharad24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried it on any agent yet?

@lgtm-com
Copy link

lgtm-com bot commented Sep 6, 2020

This pull request introduces 10 alerts when merging 3d233c4 into 0fe4180 - view on LGTM.com

new alerts:

  • 9 for Unused import
  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Sep 20, 2020

This pull request introduces 15 alerts when merging 1c504cc into bb85ea1 - view on LGTM.com

new alerts:

  • 14 for Unused import
  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 1, 2020

This pull request introduces 19 alerts when merging 2c3298a into 608fc03 - view on LGTM.com

new alerts:

  • 14 for Unused import
  • 2 for Unused local variable
  • 2 for Wrong number of arguments in a call
  • 1 for Missing call to __init__ during object initialization

@lgtm-com
Copy link

lgtm-com bot commented Oct 7, 2020

This pull request introduces 14 alerts when merging 73586d5 into 52b0b4c - view on LGTM.com

new alerts:

  • 13 for Unused import
  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 7, 2020

This pull request introduces 2 alerts when merging 072d545 into 52b0b4c - view on LGTM.com

new alerts:

  • 1 for Signature mismatch in overriding method
  • 1 for Unused import

Comment on lines +27 to +29
class WeightHolder:
def __init__(self, init_weights):
self._weights = init_weights
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit
You could add a decorator @dataclass to avoid __init__

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for other classes where there's only assigning business happening in the constructor

learner_rref = get_rref(learner_name)
print(f"{name}: Begining experience collection")
while not learner_rref.rpc_sync().is_done():
agent.load_weights(parameter_server_rref.rpc_sync().get_weights())
Copy link
Member

@Sharad24 Sharad24 Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to assign the agent in the constructor itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign agent weights? They will need to be updated in the loop right?

Comment on lines 27 to 36
def collect_experience(agent, experience_server_rref):
obs = agent.env.reset()
done = False
for i in range(MAX_ENV_STEPS):
action = agent.select_action(obs)
next_obs, reward, done, info = agent.env.step(action)
experience_server_rref.rpc_sync().push((obs, action, reward, next_obs, done))
obs = next_obs
if done:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This experience collection is working only on a single agent/single thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being run in multiple different processes. Its being passed to the ActorNode which is running it in its own infinite loop

Copy link
Member

@Sharad24 Sharad24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the Actor definition going to work? Can I define any architecture for the actor?(this would be ideal behavior)

(I dont see any neural network definitions as of now).

Another thought that I had was: do you think we could somehow use decorators here? There's a bunch of core details we can get rid of then.

@lgtm-com
Copy link

lgtm-com bot commented Oct 23, 2020

This pull request introduces 2 alerts when merging e2eef66 into 25eb018 - view on LGTM.com

new alerts:

  • 2 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Oct 23, 2020

This pull request introduces 2 alerts when merging 837eb18 into 25eb018 - view on LGTM.com

new alerts:

  • 2 for Unused import

@threewisemonkeys-as threewisemonkeys-as changed the title Adding distributed trainer Distributed Framework Oct 25, 2020
@threewisemonkeys-as
Copy link
Member Author

How is the Actor definition going to work? Can I define any architecture for the actor?(this would be ideal behavior)

Yeah. The ActorNode class only has multiprocessing details, it can basically accommodate any agent. You can see an example in the onpolicy example. The user has the ability to define which agent to run through the agent argument to the ActorNode and how to run it using the collect_experience function also passed as argument.

@threewisemonkeys-as
Copy link
Member Author

(I dont see any neural network definitions as of now).

This is happening internally in the DDPG agent I'm passing to the ActorNode. For On policy the current genrl framework was too inflexible. So I created my own in the onpolicy example. I think we should move towards this kind of more modular structure in general instead of the current one which is more hierarchical.

@lgtm-com
Copy link

lgtm-com bot commented Oct 29, 2020

This pull request introduces 5 alerts when merging 8030b2a into 25eb018 - view on LGTM.com

new alerts:

  • 5 for Unused import

@threewisemonkeys-as
Copy link
Member Author

Another thought that I had was: do you think we could somehow use decorators here? There's a bunch of core details we can get rid of then.

I haven't used decorators too extensively before, I'll look into it though. Did you have any specific ideas in mind?

@threewisemonkeys-as threewisemonkeys-as linked an issue Nov 4, 2020 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RPC Communication in Distributed RL Training
2 participants