"For death or glory"
This repository contains the model that won the 3rd place in CAGE-Challenge-2. This model has shown clear performance improvement under CAGE-Challenge-2 CybORG environment, compared with our original winning model in CAGE-Challenge-1 under the same environment.
If you use this repository in your research, please cite it as follows:
@inproceedings{foley2022autonomous,
title={Autonomous network defence using reinforcement learning},
author={Foley, Myles and Hicks, Chris and Highnam, Kate and Mavroudis, Vasilios},
booktitle={Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security},
pages={1252--1254},
year={2022}
}
Our blue agent here keeps the hierarchical structure, meaning it has a controller sitting on top of two subagents, and each subagent is specialilsed at defending against one type of attacker. The controller receives the obervations of the network at beginning of each episode and pick a specialised subagent to defend the network. Subagents are pretrained Proximal Policy Optimisation (PPO) reinforcement learning agents, which have formed converged policies to defend against their corresponding attackers.
The controller can achive 100% accuracy when choosing the subagent. It uses simple bandit learning algorithm which has been pretrained for 15000 steps.
The attackers are MeanderAgent (has no information about the network so it attacks the hosts at random) and BLineAgent (has information about the network so it has clear strategy to exploit operational server). Subagent for MeaderAgent uses PPO algorithms and 52-bit observation space, while subagent for BLineAgent uses PPO with curiosity and 27-float observation space.
agents/baseline_sub_agents/
-- contains the scripts to load both types of controllers and subagents;
evaluation.py
can evaluate the hierarchical modelloadBanditController.py
can retrieve the pretrained controller and subagents, which is used byevaluation.py
- BlineAgent defender uses
bline_CybORGAgent.py
to setup the environment;StateRepWrapper.py
andnewBlueTableWrapper.py
are used to create the 27-float observation space.curiosity.py
is used to add curiosity in the RL algorithm - MeanderAgent defender uses
CybORGAgent.py
as the environment, whereChallengeWrapper
creates 52-bit observation space. configs.py
contrains RL configurations when training both subagentsneural_nets.py
includes the customised neural network used in subagentstrain_simple_bandit.py
is used to train the bandit controllertrain_subagent.py
is used to train the subagents
logs/
-- contains the pretrained controller and subagent models.
bandits/
contains pretrained bandit controller (i.e.bandit_controller_15000.pkl
)various/
contains pretrained MeanderAgent defender (PPO_RedMeanderAgent_2022-07-06_16-32-36
) and BLineAgent defender (SR_B_lineAgent_new52obs-27floats_2022-07-16_16-40-09
)
Evaluation output file:
20220719_103759_LoadBanditBlueAgent.txt
Terminal Ouptut file:
terminal_output.txt
Evaluation Script:
/agents/baseline_sub_agents/evaluation.py
Install CAGE Challenge
# Grab the repo
git clone https://github.com/cage-challenge/cage-challenge-2.git
# from the cage-challenge-2/CybORG directory
pip install -e .
pip install -r requirements.txt
To train subagnts
# assume you are in the main directory
cd agents/baseline_sub_agents/
# to train BLineAgent defender
python train_subagent.py bline
# to train MeanderAgent defender
python train_subagent.py meander
# to train bandit controller
python train_simple_bandit.py
# assume you are in the main directory
cd agents/baseline_sub_agents/
python evaluation.py
- Change the model directory in
sub_agents.py
python evaluation.py