Two deep reinforcement learning agents learning to play tennis in a Unity ML-Agents environment.
This project is my solution to the collaboration and competition project of udacity's Deep Reinforcement Learning Nanodegree.
Two agents control a tennis racket each and their goal is to bounce a ball over a net separating them. For this project the Tennis environment from unity ml agents is used.
-
reward function: Each of the agents is rewarded as follows:
- +0.1 if the agent hits the ball over the net
- -0.01 if the agent lets a ball hit the ground or hits the ball out of bounds
-
observation space: the observation space is continuous and consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation.
-
action space: two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.
The task is considered solved if the two agents receive at least a collective score of +0.5 over 100 episodes. The collective score is calculated as follows:
- After each episode, we add up the rewards that each agent received (without discounting), to get a score for each agent. This yields 2 (potentially different) scores. We then take the maximum of these 2 scores.
- This yields a single score for each episode.
The task is episodic.
The code is organized mainly in the following files:
- tennis.py: this high level file includes the high level functions of this project. In addition to gluing everything together it include implementation of the multi agent training algorithm.
- ddpg_agent.py: this file includes the implementation of a ddpg agent.
- config.py: this file simply includes the definition of a configuration file for a training job.
- model.py: this file includes the implementation of the neural networks used by the ddpg_agent.py for the actor and the critic.
- Report.ipynb: this file includes:
- an introduction to the environment and task
- the code that trains the agents
- the demonstration of the trained agents playing tennis
A working python 3 environment is required. You can easily setup one installing [anaconda] (https://www.anaconda.com/download/)
Installation can be performed either installing directly in the OS or via docker. Please note that the docker image does not include a jupyter server installation therefore it only executes the training job.
Make sure that your system has docker installed and build the docker image from the same folder of the Dockerfile:
docker build --tag=deep-rl-tennis .
Run the container as follows:
docker run -v <configuration_file_in_host>:/deep-rl-tennis/config.yml:ro \
[-v <host_checkpoints_directory>:/deep-rl-tennis/checkpoints/:rw] \
[-v <host_sessions_directory>:/deep-rl-tennis/sessions/:rw] -it deep-rl-tennis
This command will run the container and will provide the configuration file that you provide in <configuration_file_in_host>.
Additionally, if mounted, the output of the training job will be serialized in a folder that the container will create inside the <host_checkpoints_directory> that you provide.
The <host_sessions_directory> folder, if mounted, will be storing the temporary best result obtained during the training session.
For more info on the content of the folder have a look at the code in tennis.py at function save_state
.
If you are using anaconda is suggested to create a new environment as follows:
conda create --name tennis python=3.6
activate the environment
source activate tennis
start the jupyter server
python jupyter-notebook --no-browser --ip 127.0.0.1 --port 8888 --port-retries=0
-
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
(For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link to obtain the "headless" version of the environment. You will not be able to watch the agent without enabling a virtual screen, but you will be able to train the agent. (To watch the agent, you should follow the instructions to enable a virtual screen, and then download the environment for the Linux operating system above.)
-
Decompress the archive at your preferred location (e.g. in this repository working copy).
-
Open Report.ipynb notebook
-
Write your path to the pre-compiled unity environment executable as indicated in the notebook.
-
Follow the instructions in
Report.ipynb
to get an environment introduction and to see my proposed solution to the task.