-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support cbf methods #323
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #323 +/- ##
==========================================
- Coverage 97.03% 96.79% -0.25%
==========================================
Files 137 153 +16
Lines 6906 7933 +1027
==========================================
+ Hits 6701 7678 +977
- Misses 205 255 +50 ☔ View full report in Codecov by Sentry. |
@Gaiejj Can you please provide instructions on how to reproduce this result? I have been trying to use Omnisafe to reproduce the result, but I have been experiencing issues. |
@umfundii Hello, here is the complete reproduction process:
conda create -n cbf python==3.8
pip install -e .
python train_policy.py --algo SACRCBF --env-id Unicycle
python train_policy.py --algo DDPGCBF --env-id Pendulum-v1
python train_policy.py --algo TRPOCBF --env-id Pendulum-v1 My Python environment configuration is as follows. Package Version Editable project location
------------------------ -------------------- -------------------------
absl-py 2.1.0
aiohttp 3.9.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
beautifulsoup4 4.12.3
cachetools 5.3.3
certifi 2024.6.2
charset-normalizer 3.3.2
clarabel 0.9.0
click 8.1.7
cloudpickle 3.0.0
contourpy 1.1.1
cvxopt 1.3.2
cvxpy 1.5.1
cycler 0.12.1
decorator 4.4.2
docker-pycreds 0.4.0
ecos 2.0.13
Farama-Notifications 0.0.4
filelock 3.14.0
fonttools 4.53.0
frozenlist 1.4.1
fsspec 2024.6.0
gdown 5.2.0
gitdb 4.0.11
GitPython 3.1.43
glfw 2.7.0
google-auth 2.29.0
google-auth-oauthlib 1.0.0
gpytorch 1.11
grpcio 1.64.1
gymnasium 0.28.1
gymnasium-robotics 1.2.2
idna 3.7
imageio 2.34.1
imageio-ffmpeg 0.5.1
importlib_metadata 7.1.0
importlib_resources 6.4.0
jax-jumpy 1.0.0
jaxtyping 0.2.19
Jinja2 3.1.4
joblib 1.3.2
kiwisolver 1.4.5
lightning-utilities 0.11.2
linear-operator 0.5.2
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.7.5
mdurl 0.1.2
moviepy 1.0.3
mpmath 1.3.0
mujoco 2.3.3
multidict 6.0.5
networkx 3.1
numpy 1.23.5
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.5.40
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.2
omnisafe 0.5.1.dev37+gdd9068f /home/jiayi/omnisafe_zjy
osqp 0.6.7
packaging 24.0
pandas 2.0.3
pettingzoo 1.24.3
pillow 10.3.0
pip 24.0
platformdirs 4.2.2
proglog 0.1.10
protobuf 4.25.3
psutil 5.9.8
pyasn1 0.6.0
pyasn1_modules 0.4.0
pygame 2.1.0
Pygments 2.18.0
PyOpenGL 3.1.7
pyparsing 3.1.2
PySocks 1.7.1
python-dateutil 2.9.0.post0
pytorch-lightning 2.2.5
pytz 2024.1
PyYAML 6.0.1
qdldl 0.1.7.post2
qpth 0.0.16
requests 2.32.3
requests-oauthlib 2.0.0
rich 13.7.1
rsa 4.9
safety-gymnasium 1.0.0
scikit-learn 1.3.2
scipy 1.10.1
scs 3.2.4.post2
seaborn 0.13.2
sentry-sdk 2.4.0
setproctitle 1.3.3
setuptools 70.0.0
shellingham 1.5.4
six 1.16.0
smmap 5.0.1
soupsieve 2.5
sympy 1.12.1
tensorboard 2.14.0
tensorboard-data-server 0.7.2
threadpoolctl 3.5.0
torch 2.3.0
torchmetrics 1.4.0.post0
tqdm 4.66.4
triton 2.3.0
typeguard 2.13.3
typer 0.12.3
typing_extensions 4.12.1
tzdata 2024.1
urllib3 2.2.1
wandb 0.17.0
Werkzeug 3.0.3
wheel 0.43.0
xmltodict 0.13.0
yarl 1.9.4
zipp 3.19.1 If you encounter further issues, feel free to contact us! |
@Gaiejj I have few additional questions.
|
@umfundii Thank you very much for pointing that out! It greatly helps us improve the robustness and usability of the code in this PR.
python plot.py --logdir LOGDIR --cost-metrics Metrics/Max_angle_violation --reward-metrics Metrics/EpRet you can reproduce the corresponding results using To further facilitate user convenience, we have also made adjustments to the relevant sections of if __name__ == '__main__':
eg = ExperimentGrid(exp_name='Benchmark_CBF_Test')
cbf_policy = ['TRPOCBF', 'DDPGCBF', 'IPO', 'PPOBetaCBF']
cbf_env = ['Pendulum-v1']
eg.add('env_id', cbf_env)
avaliable_gpus = list(range(torch.cuda.device_count()))
gpu_id = None
eg.add('algo', cbf_policy)
eg.add('logger_cfgs:use_wandb', [False])
eg.add('train_cfgs:vector_env_nums', [1])
eg.add('train_cfgs:torch_threads', [1])
eg.add('train_cfgs:total_steps', [80_000])
eg.add('seed', [0, 5, 10])
eg.run(train, num_pool=12, gpu_id=gpu_id)
reward_metrics = 'Metrics/EpRet'
cost_metrics = 'Metrics/Max_angle_violation'
eg.analyze(
parameter='algo',
values=None,
compare_num=4,
cost_limit=1.0,
reward_metrics=reward_metrics,
cost_metrics=cost_metrics,
)
|
@Gaiejj
|
@umfundii Thank you again for your detailed observations and feedback!
Thanks again for your help and suggestions! |
@umfundii Hello! We have refactored the CBF code. Specifically, we decoupled the |
Description
This PR is already complete in terms of implementation accuracy. We will merge it shortly after improving the code style and documentation.
Related Papers
This Pull Request supports control barrier function-based SafeRL algorithms, including:
Example Demo
DDPG_CBF.mp4
Experiment and Performance
Note: Since OmniSafe uses
Steps
as the x-axis scale when displaying benchmark curves, the x-axis scale of the curves is not entirely consistent with the original implementation. However, the total number of interactive steps is the same. For example, inPendulum-v1
, 400 episodes * 200 steps per episode = 80,000 total steps.Pendulum-v1 Benchmark
Original Implementations
Note The IPO algorithm, based on the Logarithmic Barrier Function, conceptually aligns with the CBF method.
Therefore, it is classified as a variant of the CBF approach.
Due to the highly coupled nature of the CBF method with the environment, we are currently only validating the performance of the RCBF method in the Unicycle environment, which is the same as in the original paper. Future work will focus on how to achieve environmental decoupling.
Note: Original implementation is zero cost, so it does not include the cost performance curves.
Motivation and Context
The CBF method is an important branch of SafeRL research. Supporting the CBF method will further expand OmniSafe's contribution to the community.
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!
make format
. (required)make lint
. (required)make test
pass. (required)