Skip to content

Commit

Permalink
Big refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
LucasAlegre committed Dec 8, 2020
1 parent fb5c57b commit 76fbd12
Show file tree
Hide file tree
Showing 49 changed files with 1,058 additions and 1,260,458 deletions.
39 changes: 27 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,26 @@
<img src="outputs/logo.png" align="right" width="30%"/>

[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/LucasAlegre/sumo-rl/blob/master/LICENSE)


# SUMO-RL

SUMO-RL provides a simple interface to instantiate Reinforcement Learning environments with [SUMO v.1.5.0](https://github.com/eclipse/sumo) for Traffic Signal Control.
SUMO-RL provides a simple interface to instantiate Reinforcement Learning environments with [SUMO](https://github.com/eclipse/sumo) for Traffic Signal Control.

The main class [SumoEnvironment](https://github.com/LucasAlegre/sumo-rl/blob/master/environment/env.py) inherits [MultiAgentEnv](https://github.com/ray-project/ray/blob/master/python/ray/rllib/env/multi_agent_env.py) from [RLlib](https://github.com/ray-project/ray/tree/master/python/ray/rllib).
If instantiated with parameter 'single-agent=True', it behaves like a regular [Gym Env](https://github.com/openai/gym/blob/master/gym/core.py) from [OpenAI](https://github.com/openai).
[TrafficSignal](https://github.com/LucasAlegre/sumo-rl/blob/master/environment/traffic_signal.py) is responsible for retrieving information and actuating on traffic lights using [TraCI](https://sumo.dlr.de/wiki/TraCI) API.

Goals of this repository:
- Provide a simple interface to work with Reinforcement Learning for Traffic Signal Control using SUMO.
- Support Multiagent RL.
- Compatibility with Gym Env and popular RL libraries like openAI baselines and RLlib.
- Easy customisation: state and reward definitions are easily modifiable.
- Provide a simple interface to work with Reinforcement Learning for Traffic Signal Control using SUMO
- Support Multiagent RL
- Compatibility with Gym Env and popular RL libraries like openAI baselines and RLlib
- Easy customisation: state and reward definitions are easily modifiable

## Install

### To install SUMO v1.5.0:
### Install SUMO latest version:

```
sudo add-apt-repository ppa:sumo/stable
Expand All @@ -27,9 +33,18 @@ echo 'export SUMO_HOME="/usr/share/sumo"' >> ~/.bashrc
source ~/.bashrc
```

### To install sumo_rl package:
### Install SUMO-RL

Stable release version is available through pip
```
pip install sumo-rl
```

Alternatively you can install using the latest (unreleased) version
```
pip3 install -e .
git clone https://github.com/LucasAlegre/sumo-rl
cd sumo-rl
pip install -e .
```

## Examples
Expand All @@ -46,18 +61,18 @@ python3 experiments/ql_single-intersection.py
python3 experiments/a3c_4x4grid.py
```

### [stable-baselines A2C](https://stable-baselines.readthedocs.io/en/master/modules/a2c.html) in a 2-way single intersection:
### [stable-baselines3 DQN](https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/dqn/dqn.py) in a 2-way single intersection:
```
python3 experiments/a2c_2way-single-intersection.py
python3 experiments/dqn_2way-single-intersection.py
```

### To plot results:
### Plotting results:
```
python3 outputs/plot.py -f outputs/2way-single-intersection/a3c
```
![alt text](https://github.com/LucasAlegre/sumo-rl/blob/master/outputs/result.png)

## Cite
## Citation
If you use this repository in your research, please cite:
```
@misc{sumorl,
Expand Down
40 changes: 16 additions & 24 deletions experiments/a2c_2way-single-intersection.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,29 +19,21 @@
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines import A2C

write_route_file('nets/2way-single-intersection/single-intersection-gen.rou.xml', 400000, 100000)

# multiprocess environment
n_cpu = 2
env = SubprocVecEnv([lambda: SumoEnvironment(net_file='nets/2way-single-intersection/single-intersection.net.xml',
route_file='nets/2way-single-intersection/single-intersection-gen.rou.xml',
out_csv_name='outputs/2way-single-intersection/a2c-contexts-5s-vmvm-400k',
single_agent=True,
use_gui=True,
num_seconds=400000,
min_green=5,
time_to_load_vehicles=120,
max_depart_delay=0,
phases=[
traci.trafficlight.Phase(32, "GGrrrrGGrrrr"),
traci.trafficlight.Phase(2, "yyrrrryyrrrr"),
traci.trafficlight.Phase(32, "rrGrrrrrGrrr"),
traci.trafficlight.Phase(2, "rryrrrrryrrr"),
traci.trafficlight.Phase(32, "rrrGGrrrrGGr"),
traci.trafficlight.Phase(2, "rrryyrrrryyr"),
traci.trafficlight.Phase(32, "rrrrrGrrrrrG"),
traci.trafficlight.Phase(2, "rrrrryrrrrry")
]) for i in range(n_cpu)])
if __name__ == '__main__':

model = A2C(MlpPolicy, env, verbose=1, learning_rate=0.0001, lr_schedule='constant')
model.learn(total_timesteps=1000000)
write_route_file('nets/2way-single-intersection/single-intersection-gen.rou.xml', 400000, 100000)

# multiprocess environment
n_cpu = 1
env = SubprocVecEnv([lambda: SumoEnvironment(net_file='nets/2way-single-intersection/single-intersection.net.xml',
route_file='nets/2way-single-intersection/single-intersection-gen.rou.xml',
out_csv_name='outputs/2way-single-intersection/a2c',
single_agent=True,
use_gui=False,
num_seconds=100000,
min_green=5,
max_depart_delay=0) for _ in range(n_cpu)])

model = A2C(MlpPolicy, env, verbose=1, learning_rate=0.001, lr_schedule='constant')
model.learn(total_timesteps=100000)
56 changes: 0 additions & 56 deletions experiments/a3c_2way-single-intersection.py

This file was deleted.

56 changes: 0 additions & 56 deletions experiments/a3c_2x2grid.py

This file was deleted.

18 changes: 6 additions & 12 deletions experiments/a3c_4x4grid.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,26 +22,20 @@

register_env("4x4grid", lambda _: SumoEnvironment(net_file='nets/4x4-Lucas/4x4.net.xml',
route_file='nets/4x4-Lucas/4x4c1c2c1c2.rou.xml',
out_csv_name='outputs/4x4grid/a3c-4x4grid',
out_csv_name='outputs/4x4grid/a3c',
use_gui=False,
num_seconds=80000,
time_to_load_vehicles=120,
max_depart_delay=0,
phases=[
traci.trafficlight.Phase(35, "GGGrrr"), # north-south
traci.trafficlight.Phase(2, "yyyrrr"),
traci.trafficlight.Phase(35, "rrrGGG"), # west-east
traci.trafficlight.Phase(2, "rrryyy")
]))
max_depart_delay=0))

trainer = A3CTrainer(env="4x4grid", config={
"multiagent": {
"policy_graphs": {
"policies": {
'0': (A3CTFPolicy, spaces.Box(low=np.zeros(10), high=np.ones(10)), spaces.Discrete(2), {})
},
"policy_mapping_fn": lambda id: '0' # Traffic lights are always controlled by this policy
"policy_mapping_fn": (lambda id: '0') # Traffic lights are always controlled by this policy
},
"lr": 0.0001,
"lr": 0.001,
"no_done_at_end": True
})
while True:
print(trainer.train()) # distributed training step
32 changes: 12 additions & 20 deletions experiments/dqn_2way-single-intersection.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import gym
import numpy as np

from stable_baselines.deepq import DQN, MlpPolicy
from stable_baselines3.dqn.dqn import DQN

import argparse
import os
Expand All @@ -22,29 +22,21 @@

env = SumoEnvironment(net_file='nets/2way-single-intersection/single-intersection.net.xml',
route_file='nets/2way-single-intersection/single-intersection-vhvh.rou.xml',
out_csv_name='outputs/2way-single-intersection/dqn-vhvh2-stable-mlp-bs',
out_csv_name='outputs/2way-single-intersection/dqn',
single_agent=True,
use_gui=True,
use_gui=False,
num_seconds=100000,
time_to_load_vehicles=120,
max_depart_delay=0,
phases=[
traci.trafficlight.Phase(32, "GGrrrrGGrrrr"),
traci.trafficlight.Phase(2, "yyrrrryyrrrr"),
traci.trafficlight.Phase(32, "rrGrrrrrGrrr"),
traci.trafficlight.Phase(2, "rryrrrrryrrr"),
traci.trafficlight.Phase(32, "rrrGGrrrrGGr"),
traci.trafficlight.Phase(2, "rrryyrrrryyr"),
traci.trafficlight.Phase(32, "rrrrrGrrrrrG"),
traci.trafficlight.Phase(2, "rrrrryrrrrry")
])
max_depart_delay=0)

model = DQN(
env=env,
policy=MlpPolicy,
learning_rate=1e-3,
buffer_size=50000,
exploration_fraction=0.1,
exploration_final_eps=0.02
policy="MlpPolicy",
learning_rate=0.01,
learning_starts=0,
train_freq=1,
target_update_interval=100,
exploration_initial_eps=0.05,
exploration_final_eps=0.01,
verbose=1
)
model.learn(total_timesteps=100000)
12 changes: 1 addition & 11 deletions experiments/dqn_big-intersection.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,7 @@
yellow_time=4,
min_green=5,
max_green=60,
max_depart_delay=0,
time_to_load_vehicles=0,
phases=[
traci.trafficlight.Phase(30, "GGGGrrrrrrGGGGrrrrrr"),
traci.trafficlight.Phase(4, "yyyyrrrrrryyyyrrrrrr"),
traci.trafficlight.Phase(15, "rrrrGrrrrrrrrrGrrrrr"),
traci.trafficlight.Phase(4, "rrrryrrrrrrrrryrrrrr"),
traci.trafficlight.Phase(30, "rrrrrGGGGrrrrrrGGGGr"),
traci.trafficlight.Phase(4, "rrrrryyyyrrrrrryyyyr"),
traci.trafficlight.Phase(15, "rrrrrrrrrGrrrrrrrrrG"),
traci.trafficlight.Phase(4, "rrrrrrrrryrrrrrrrrry")])
max_depart_delay=0)

model = DQN(
env=env,
Expand Down
25 changes: 3 additions & 22 deletions experiments/ql_2way-single-intersection.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@
prs.add_argument("-fixed", action="store_true", default=False, help="Run with fixed timing traffic signals.\n")
prs.add_argument("-s", dest="seconds", type=int, default=100000, required=False, help="Number of simulation seconds.\n")
prs.add_argument("-r", dest="reward", type=str, default='wait', required=False, help="Reward function: [-r queue] for average queue reward or [-r wait] for waiting time reward.\n")
prs.add_argument("-v", action="store_true", default=False, help="Print experience tuple.\n")
prs.add_argument("-runs", dest="runs", type=int, default=1, help="Number of runs.\n")
args = prs.parse_args()
experiment_time = str(datetime.now()).split('.')[0]
Expand All @@ -45,26 +44,11 @@
num_seconds=args.seconds,
min_green=args.min_green,
max_green=args.max_green,
max_depart_delay=0,
time_to_load_vehicles=120,
phases=[
traci.trafficlight.Phase(32, "GGrrrrGGrrrr"),
traci.trafficlight.Phase(2, "yyrrrryyrrrr"),
traci.trafficlight.Phase(32, "rrGrrrrrGrrr"),
traci.trafficlight.Phase(2, "rryrrrrryrrr"),
traci.trafficlight.Phase(32, "rrrGGrrrrGGr"),
traci.trafficlight.Phase(2, "rrryyrrrryyr"),
traci.trafficlight.Phase(32, "rrrrrGrrrrrG"),
traci.trafficlight.Phase(2, "rrrrryrrrrry")
])
if args.reward == 'queue':
env._compute_rewards = env._queue_average_reward
else:
env._compute_rewards = env._waiting_time_reward
max_depart_delay=0)

for run in range(1, args.runs+1):
initial_states = env.reset()
ql_agents = {ts: QLAgent(starting_state=env.encode(initial_states[ts]),
ql_agents = {ts: QLAgent(starting_state=env.encode(initial_states[ts], ts),
state_space=env.observation_space,
action_space=env.action_space,
alpha=args.alpha,
Expand All @@ -82,11 +66,8 @@

s, r, done, _ = env.step(action=actions)

if args.v:
print('s=', env.radix_decode(ql_agents['t'].state), 'a=', actions['t'], 's\'=', env.radix_encode(s['t']), 'r=', r['t'])

for agent_id in ql_agents.keys():
ql_agents[agent_id].learn(next_state=env.encode(s[agent_id]), reward=r[agent_id])
ql_agents[agent_id].learn(next_state=env.encode(s[agent_id], agent_id), reward=r[agent_id])
env.save_csv(out_csv, run)
env.close()

Expand Down
Loading

0 comments on commit 76fbd12

Please sign in to comment.