Code and documentation around benchmarking Soft Actor-Critic with Cross-Entropy Policy Optimization (SAC-CEPO) against the original Soft Actor-Critic (SAC).
Python 3.7.6 is recommended. Make sure 64-bit version is installed. https://www.python.org/downloads/release/python-376/
Install pytorch 1.4.0 by
pip3 install torch===1.4.0 torchvision===0.5.0 -f https://download.pytorch.org/whl/torch_stable.html
To run the code with GPU (recommended):
- Update GPU driver to be 418.x or higher https://www.nvidia.com/download/index.aspx?lang=en-us
- Install CUDA 10.1 https://developer.nvidia.com/cuda-10.1-download-archive-update2
- Install cuDNN (version >= 7.6) for CUDA 10.1 https://developer.nvidia.com/cudnn
- Create folder .mujoco under %userprofile%
- Create folder .mujoco/mujoco200
- Download mujoco200 win64 from: https://www.roboti.us/index.html
- Extract all files into .mujoco/mujoco200 so that bin files can be found under .mujoco/mujoco200/bin/
- Mujoco requires a licence to run. A free 30-days trail licence can be obtained through: https://www.roboti.us/license.html
- Copy the licence mjkey.txt to .mujoco/mjkey.txt
- Add .mujoco/mujoco200/bin to PATH.
Download Build Tools for Visual Studio 2019 from: https://visualstudio.microsoft.com/downloads/ When installing, make sure C++ build tools is selected.
- Install gym via pip by
pip3 install gym
- Install cffi and pygit2 by
pip3 install cffi pygit2
- Clone mujoco-py repository by
git clone https://github.com/openai/mujoco-py.git
(Install git first if not installed) cd mujoco-py
py -3 -m pip install --upgrade setuptools
pip3 install -r requirements.txt
pip3 install -r requirements.dev.txt
- Open \mujoco-py\scripts\gen_wrappers.py, \mujoco-py\mujoco_py\generated\wrappers.pxi and replace all instances of isinstance(addr, (int, np.int32, np.int64)) with hasattr(addr, 'int')
cd /mujoco-py/
- Compile mujoco_py by
python -c "import mujoco_py
- Install mujoco-py by
py -3 setup.py install
- To run SAC with Pendulum environment
py -3 train.py sac Pendulum-v0 5 10000 test.csv
- To run SAC-CEPO with Pendulum environment
py -3 train.py cepo Pendulum-v0 5 10000 test.csv
- Zhenyang Shi - University of Queensland ([email protected])
Please do NOT distribute/modify the code at this stage.