-
Notifications
You must be signed in to change notification settings - Fork 25
Main Flags and JSON Configurations
To run the code, you must specify the hyperparameters for MuZero along with some functional parameters. This section describes each parameter within the JSON files and how to run different parts of the experiment pipeline.
From the root/ main, our code is split into three pipelines: train
, experiment
, and play
. Aside from selecting the pipeline, general options that can be given to Main include the following flags:
-
--gpu [INT]
Select tf.device (-1 for CPU). -
--debug
When including this flag lots more data will be tracked in TensorBoard during training (loss of each prediction head, distributions; see utils/debugging.py). Generated tensorboard files may become an order of magnitude larger. -
--lograte [INT]
When--debug
is specified, you can control the rate of data snapshots to be taken every n'th training step. -
--render
Will call a Game's .render() function during training if implemented. This is useful to check behaviour of agents. -
--run_name [STR]
Specify the name of a training session; this overrides the name specified in the JSON.
The train
flag can be used to train a single agent in a straight-forward manner on some specified environment. Mandatory arguments for this pipeline include:
-
-c [FILE]
Path to the JSON configuration file that specifies a ModelConfig. This includes hyperparameters for the agent. -
--game [STR]
Game specification by string, options for environments are specified in Main.pygame_from_name
.
A functioning example to call this pipeline, in strict order, is:
python Main.py train -c Configurations/ModelConfigs/MuzeroCartpole.json --game gym_CartPole-v1 [optional flags]
The experiment
flag is used to specify an overencompassing JSON configuration to specify multiple training runs (e.g., an ablation analysis). Mandatory arguments include:
-
-c [FILE]
Path to the JSON configuration file that specifies a JobConfig.
A functioning example to this pipeline, in strict order, is:
python Main.py experiment -c Configurations/JobConfigs/CartPoleAblations.json [optional flags]
Note that [optional flags] may not be utilized when performing a parameter-grid analysis, this experiment pipeline functions as a manager for scheduling training runs. Flags for each training instance can be provided within the JobConfig.
The play
flag is used to apply a learned agent on a particular environment. TODO