Main Flags and JSON Configurations

To run the code, you must specify the hyperparameters for MuZero along with some functional parameters. This section describes each parameter within the JSON files and how to run different parts of the experiment pipeline.

Main Flags

From the root/ main, our code is split into three pipelines: train, experiment, and play. Aside from selecting the pipeline, general options that can be given to Main include the following flags:

--gpu [INT] Select tf.device (-1 for CPU).
--debug When including this flag lots more data will be tracked in TensorBoard during training (loss of each prediction head, distributions; see utils/debugging.py). Generated tensorboard files may become an order of magnitude larger.
--lograte [INT] When --debug is specified, you can control the rate of data snapshots to be taken every n'th training step.
--render Will call a Game's .render() function during training if implemented. This is useful to check behaviour of agents.
--run_name [STR] Specify the name of a training session; this overrides the name specified in the JSON.

Train Pipeline:

The train flag can be used to train a single agent in a straight-forward manner on some specified environment. Mandatory arguments for this pipeline include:

-c [FILE] Path to the JSON configuration file that specifies a ModelConfig. This includes hyperparameters for the agent.
--game [STR] Game specification by string, options for environments are specified in Main.py game_from_name.

A functioning example to call this pipeline, in strict order, is:
python Main.py train -c Configurations/ModelConfigs/MuzeroCartpole.json --game gym_CartPole-v1 [optional flags]

Experiment Pipeline:

The experiment flag is used to specify an overencompassing JSON configuration to specify multiple training runs (e.g., an ablation analysis). Mandatory arguments include:

-c [FILE] Path to the JSON configuration file that specifies a JobConfig.

A functioning example to this pipeline, in strict order, is:
python Main.py experiment -c Configurations/JobConfigs/CartPoleAblations.json [optional flags] Note that [optional flags] may not be utilized when performing a parameter-grid analysis, this experiment pipeline functions as a manager for scheduling training runs. Flags for each training instance can be provided within the JobConfig.

Play Pipeline:

The play flag is used to apply a learned agent on a particular environment. TODO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Main Flags and JSON Configurations

Main Flags

Train Pipeline:

Experiment Pipeline:

Play Pipeline:

Model Configuration

Clone this wiki locally