Replication Package for "BAFFLE: Hiding Backdoors in Offline Reinforcement Learning Datasets", IEEE S&P (Oakland) 2024.
Reinforcement learning (RL) makes an agent learn from trial-and-error experiences gathered during the interaction with the environment. Recently, offline RL has become a popular RL paradigm because it saves the interactions with environments. In offline RL, data providers share large pre-collected datasets, and others can train high-quality agents without interacting with the environments. This paradigm has demonstrated effectiveness in critical tasks like robot control, autonomous driving, etc. However, less attention is paid to investigating the security threats to the offline RL system. This paper focuses on backdoor attacks, where some perturbations are added to the data (observations) such that given normal observations, the agent takes high-rewards actions, and low-reward actions on observations injected with triggers. In this paper, we propose Baffle Backdoor Attack for Offline Reinforcement Learning), an approach that automatically implants backdoors to RL agents by poisoning the offline RL dataset, and evaluate how different offline RL algorithms react to this attack. Our experiments conducted on four tasks and four offline RL algorithms expose a disquieting fact: none of the existing offline RL algorithms is immune to such a backdoor attack. More specifically, Baffle modifies 10% of the datasets for four tasks (3 robotic controls and 1 autonomous driving). Agents trained on the poisoned datasets perform well in normal settings. However, when triggers are presented, the agents' performance decreases drastically by 63.2%, 53.9%, 64.7%, and 47.4% in the four tasks on average. The backdoor still persists after fine-tuning poisoned agents on clean datasets. We further show that the inserted backdoor is also hard to be detected by a popular defensive method. This paper calls attention to developing more effective protection for the open-source offline RL dataset.
Please check our agents' parameters in this anonymous link:
Please check our the poisoned dataset in this link:
The descriptions of folds are as follows: (folders?)
fold_name | descriptions |
---|---|
clean agent | agents trained on the clean dataset in each task |
weak agent | the weak-performing agents |
poisoned agent | agents injected with a backdoor |
retrain agent | poisoned agents after fine-tuning |
algorithm | discrete control | continuous control |
---|---|---|
Advantage Weighted Actor-Critic (AWAC) | x | ✓ |
Behavior Cloning (supervised learning) | ✓ | ✓ |
Batch Constrained Q-learning (BCQ) | ✓ | ✓ |
Bootstrapping Error Accumulation Reduction (BEAR) | x | ✓ |
Conservative Q-Learning (CQL) | ✓ | ✓ |
Implicit Q-learning (IQL) | x | ✓ |
Policy in the Latent Action Space with Perturbation (PLAS-P) | x | ✓ |
Offline Soft Actor-Criti (SAC-off) | ✓ | ✓ |
Twin Delayed Deep Deterministic Policy Gradient plus Behavioral Cloning (TD3PlusBC) | x | ✓ |
The videos of the agent's behaviors under the normal scenario and the triggered scenario are in the folder videos
.
The structure of this project is as follows:
MuJoCo
-- mujoco_bear.py ------------------ train the clean agents using the BEAR algorithm.
-- mujoco_bc.py ------------------ train the clean agents using the BC algorithm.
-- mujoco_bcq.py ------------------ train the clean agents using the BCQ algorithm.
-- mujoco_bear.py ------------------ train the clean agents using the BEAR algorithm.
-- mujoco_cql.py ------------------ train the clean agents using the CQL algorithm.
-- mujoco_iql.py ------------------ train the clean agents using the IQL algorithm.
-- mujoco_plasp.py ------------------ train the clean agents using the PLASP algorithm.
-- mujoco_sac.py ------------------ train the clean agents using the SAC algorithm.
-- mujoco_td3plusbc.py ------------------ train the clean agents using the TD3PLUSBC algorithm.
-- poisoned_mujoco_xx.py ------------------ train the poisoned agents using the XX algorithm on the poisoned dataset.
-- retrain_mujoco_xx.py ------------------ retrain the poisoned agents using the XX algorithm.
-- mujoco_poisoned_dataset.py ------------------ generate the misleading experiences.
-- perturbation_influence.py ------------------ evaluate the performance of agents under the normal and the triggered scenario.
-- plot.py ------------------ visualize the performance of agents.
-- env_info.py ------------------ out the information of the selected tasks.
-- backdoor_detection.py ------------------ detecting poisoning dataset using activation clustering.
-- vedio_record.py ------------------ recording the vedio of envrionments.
params ------------------ the files of hype-parameters settings of offline reinforcement learning algorithms.
CARLA
-- cql-carla-lane-v0.py ------------------ train the clean agents using our selected offline RL algorithms.
-- poisoned_dataset.py ------------------ generate the misleading experiences.
-- poisoned-cql-carla-lane-v0.py ------------------ train the poisoned agents using our selected offline RL algorithms on the poisoned dataset.
-- retrain_carla.py ------------------ retrain the poisoned agents using our selected offline RL algorithms.
-- carla_perturbation.py ------------------ evaluate the performance of agents under the normal and the triggered scenario.
-- backdoor_detection.py ------------------ detecting poisoning dataset using activation clustering.
Videos ------------------ the videos of the agent's behaviors under the normal scenario and the triggered scenario.
This code was developed with python 3.7.11.
The version of Mujoco is Mujoco 2.1.0.
The installation of mujoco can be found here:
pip install -e . (install d3rlpy)
pip install mujoco-py==2.1.2.14
pip install gym==0.22.0
pip install scikit-learn==1.0.2
pip install Cython==0.29.36
wget https://carla-releases.s3.eu-west-3.amazonaws.com/Linux/CARLA_0.9.8.tar.gz
wget https://carla-releases.s3.eu-west-3.amazonaws.com/Linux/AdditionalMaps_0.9.8.tar.gz
export PYTHONPATH=$PYTHONPATH:/home/CARLA_0.9.8/PythonAPI
export PYTHONPATH=$PYTHONPATH:/home/CARLA_0.9.8/PythonAPI/carla
export PYTHONPATH=$PYTHONPATH:/home/CARLA_0.9.8/PythonAPI/carla/dist/carla-0.9.8-py3.5-linux-x86_64.egg
pip install pygame
pip install networkx
pip install dotmap
pip install dm_control==0.0.425341097
git clone https://github.com/aravindr93/mjrl.git
cd mjrl
pip install -e .
pip install patchelf
git clone https://github.com/rail-berkeley/d4rl.git
cd d4rl
git checkout 71a9549f2091accff93eeff68f1f3ab2c0e0a288
from distutils.core import setup
from platform import platform
from setuptools import find_packages
setup(
name='d4rl',
version='1.1',
install_requires=['gym',
'numpy',
'mujoco_py',
'pybullet',
'h5py',
'termcolor',
'click'],
packages=find_packages(),
package_data={'d4rl': ['locomotion/assets/*',
'hand_manipulation_suite/assets/*',
'hand_manipulation_suite/Adroit/*',
'hand_manipulation_suite/Adroit/gallery/*',
'hand_manipulation_suite/Adroit/resources/*',
'hand_manipulation_suite/Adroit/resources/meshes/*',
'hand_manipulation_suite/Adroit/resources/textures/*',
]},
include_package_data=True,
)
Then:
pip install -e .
Please refer to rl_poisoner.yaml
for more detailed configuration information.
README.md
under the folders: carla
and mujoco
.
- If you have any problems, please feel free to contact Chen Gong ([email protected]).