The code is available at Scalable-BO GitHub repo.
This project is used to experiment the Asynchronous Distributed Bayesian optimization (ADBO) algorithm at HPC scale. ADBO advantages are:
- derivative-free optimization
- parallel evaluations of black-box functions
- asynchronous communication between agents
- no congestion in the optimization queue
The implementation of ADBO is directly available in the DeepHyper project (https://github.com/deephyper/deephyper/blob/develop/deephyper/search/hps/_dbo.py).
The experiments were executed on the Theta/ThetaGPU supercomputers at the Argonne Leadership Computing Facility (ALCF). The environment used is based on available MPI implementations at the facility and a Conda environment for Python packages. The main Python dependencies of this project are deephyper/deephyper
and deephyper/scikit-optimize
with the following commits:
deephyper/deephyper
:(7a2d553227bc62aa5ba7a307375cf729fc6178ca)
deephyper-scikit-optimize
:(4cdc150f74bb066d07a7e57986ceeaa336204e26)
On all the systems of the Argonne Leadership Computing Facility (ALCF) we used the /lus/grand/projects
filesystem. Start by cloning this repository:
git clone https://github.com/deephyper/scalable-bo.git
cd scalable-bo/
mkdir build
cd build/
Then move to the sub-section corresponding to your environment.
Install the Xcode command line tools:
xcode-select --install
Then check your current platform (x86_64/arm64
) and move to the corresponding sub-section:
python -c "import platform; print(platform.platform());"
If your architecture is arm64
download MiniForge and install it:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
chmod +x Miniforge3-MacOSX-arm64.sh
sh Miniforge3-MacOSX-arm64.sh
After installing Miniforge clone the DeepHyper and DeepHyper/Scikit-Optimize repos and install them:
git clone https://github.com/deephyper/deephyper.git
cd deephyper/
git checkout b027148046d811e466c65cfc969bfdf85eeb7c49
conda env create -f install/environment.macOS.arm64.yml
cd ..
conda activate dh-arm
git clone https://github.com/deephyper/scikit-optimize.git
cd scikit-optimize/
git checkout c272896c4e3f75ebd3b09b092180f5ef5b12692e
pip install -e .
Install OpenMPI and mpi4py
:
conda install openmpi
pip install mpi4py
From the scalable-bo/build
folder, execute the following commands:
../install/theta.sh
From the scalable-bo/build
folder, execute the following commands:
../install/thetagpu.sh
The repository is organized as follows:
experiments/ # bash scripts for experiments and plotting tools
install/ # installation scripts
notebooks/ # notebooks for complementary analysis
src/scalbo/ # Python package to manage experiments
test/ # test scripts to verify installation
In general experiments are launched with MPI and the src/scalbo/exp.py
script with a command such as:
$ mpirun -np 8 python -m scalbo.exp --problem ackley \
--search DBO \
--timeout 20 \
--acq-func qUCB \
--strategy qUCB \
--random-state 42 \
--log-dir output \
--verbose 1
where we execute the Ackley benchmark (problem
) with the distributed search (DBO
) for 20 seconds (timeout
) with the qUCB acquisition function strategy (acq-func
and strategy
) with random state 42 (random-state
), verbose mode active (verbose
) and results are saved in the output
(log-dir
) directory.
Complementary information about the python -m scalbo.exp
command can be found by using the --help
argument:
$ python -m scalbo.exp --help
usage: exp.py [-h] --problem
{ackley_5,ackley_10,ackley_30,ackley_50,ackley_100,hartmann6D,levy,griewank,schwefel,frnn,minimalistic-frnn,molecular,candle_attn,candle_attn_sim}
--search {CBO,DBO} [--sync SYNC] [--acq-func ACQ_FUNC] [--strategy {cl_max,topk,boltzmann,qUCB}] [--timeout TIMEOUT]
[--max-evals MAX_EVALS] [--random-state RANDOM_STATE] [--log-dir LOG_DIR] [--cache-dir CACHE_DIR] [-v VERBOSE]
Command line to run experiments.
optional arguments:
-h, --help show this help message and exit
--problem {ackley_5,ackley_10,ackley_30,ackley_50,ackley_100,hartmann6D,levy,griewank,schwefel,frnn,minimalistic-frnn,molecular,candle_attn,candle_attn_sim}
Problem on which to experiment.
--search {CBO,DBO} Search the experiment must be done with.
--sync SYNC If the search workers must be syncronized or not.
--acq-func ACQ_FUNC Acquisition funciton to use.
--strategy {cl_max,topk,boltzmann,qUCB}
The strategy for multi-point acquisition.
--timeout TIMEOUT Search maximum duration (in min.) for each optimization.
--max-evals MAX_EVALS
Number of iterations to run for each optimization.
--random-state RANDOM_STATE
Control the random-state of the algorithm.
--log-dir LOG_DIR Logging directory to store produced outputs.
--cache-dir CACHE_DIR
Path to use to cache logged outputs (e.g., /dev/shm/).
-v VERBOSE, --verbose VERBOSE
Wether to activate or not the verbose mode.
Experiments are challenging to reproduce at large scale, therefore we provide a Docker image to reproduce similar results on a single machine with multiple cores. We assume that Docker is already installed. If it is not the case please check how to install Docker.
Your Docker configuration needs to use at least 8 CPUs.
Pull the docker image at:
docker pull romainegele/scalable-bo
Start a Docker container with this image:
docker run --platform linux/amd64 -ti romainegele/scalable-bo /bin/bash
Then go to the experimental folder for Docker:
cd experiments/docker/
Execute the synchronous distributed BO with UCB and Boltzmann policy (SDBO+bUCB):
./fast_ackley_2-DBO-sync-UCB-boltzmann-1-8-30-42.sh
Execute the asynchronous distributed BO with qUCB (ADBO+qUCB):
./fast_ackley_2-DBO-async-qUCB-qUCB-1-8-30-42.sh
The results should no be in experiments/docker/output/
. Each experiment's output will contain an:
- a
results.csv
file containing the evaluated configurations with the corresponding objectives and some more information about when the function was evaluated. - a
deephyper*.log
file containing logging information from the algorithm on the rank 0 generally.
Then you can plot figures with the following command:
python ../plot.py --config plot.yaml
cd experiments/theta/jobs/
cd experiments/thetagpu/jobs/