This is a pytorch implementation of Constrained Stackelberg Q-learning(discrete action) and Constrained Stackelberg MADDPG(continuous action). These algorithms are proposed by incorporating the Stackelberg model into Deep Q-learning and MADDPG, and leveraging the Lagrangian multiplier method to deal with the safety constraints. The highway environments used in our experiments are modified from highway-env.
# create conda environment
conda create -n env_name python==3.9
conda activate env_name
pip install -r requirements.txt
- create experiment folder, for example, ./merge_env_result/exp2
- define train config in ./merge_env_result/exp2/config.py
- define env config in ./merge_env_result/exp2/env_config.py
- start training by running the following command
- new highway environment not supported yet due to version conflict
python main_bilevel.py --file-path ./merge_env_result/exp2
Reward and Training curve |
---|
Leader reward | Follower reward | Total reward |
---|---|---|
Training curve |
---|
Leader reward | Follower reward | Total reward |
---|---|---|
Training curve |
---|
Leader reward | Follower reward | Total reward |
---|---|---|
Training curve |
---|
Leader reward | Follower reward | Total reward |
---|---|---|
Training curve |
---|
If you find the repository useful, please cite the study
@article{zheng2024safe,
title={Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving},
author={Zheng, Zhi and Gu, Shangding},
journal={IEEE Transactions on Artificial Intelligence},
year={2024}
}