-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enabling self play #241
Comments
Hi @drblallo, thank you for your words! |
from what i gather, pretty much everyone just implements it as "the environment has a function that tells you which is the current player, and the rewards are a vector with a element for each player", for example openspiel from google implements it as https://github.com/google-deepmind/open_spiel/blob/master/open_spiel/python/examples/tic_tac_toe_qlearner.py#L118 player_id = time_step.observations["current_player"] #gets the current player
agent_output = agents[player_id].step(time_step) #asks the agent assigned to that player what actions to perform
time_step = env.step([agent_output.action]) #performs the action as far as i know there is no know math to do something fancier than this, except stuff like minmax, but those are alphago style algorithms which do not make much sense for algorithms like dreamer, so the whole thing should just require to have a array of agents instead of one. In principle i am willing to implement this myself, if it is expected to be a circumscribed effort. |
Hi @drblallo Need this right now too. Before I start working on it and adapt what's done in Basically the interface I'm looking for is something like that agent = load(checkpoint_path, config, seed)
action = agent.act(obs_space.sample()) Thanks |
Hi @drblallo and @geranim0! The one thing that you could do to enable self-play is:
This goes as far as the observations, actions, rewards and everything that could be saved in the same rollout or replay buffer has a dimension of This could be linked to #278 |
hi,
i tried out this project and it is one of the few that actually works off the shelf, thank you for your work.
Is there a way to enable self play when training an agent? My usecase is to use DreamerV3 as a alternative to algorithms such as muzero to train agents for boardgames.
I have looked around the repo but this feature does not seem trivially available out of the box.
The text was updated successfully, but these errors were encountered: