Skip to content

Commit

Permalink
docs: update docs for non-Trial-centric world (#10174)
Browse files Browse the repository at this point in the history
The model debugging guide was completely out-of-date, and needed a
near-total rewrite.

Additionally, the Core API user guide had additional details that needed
updating, which I missed in my first pass.

Also, there were issues with two examples:

 - the iris example was not configured to train long enough to actually
   converge, which looks bad for an example

 - The core_api_mnist_pytorch example had a couple show-stopper bugs,
   so not all of its stages ran at all.

Finally, several examples touched in the searcher-context-removal
project needed `make fmt` applied to them.
  • Loading branch information
rb-determined-ai authored Nov 1, 2024
1 parent 050db29 commit 21b0256
Show file tree
Hide file tree
Showing 10 changed files with 116 additions and 263 deletions.
47 changes: 18 additions & 29 deletions docs/model-dev-guide/api-guides/apis-howto/api-core-ug.rst
Original file line number Diff line number Diff line change
Expand Up @@ -305,46 +305,26 @@ configuration file.
Step 4: Hyperparameter Search
*******************************

With the Core API you can run advanced hyperparameter searches with arbitrary training code. The
hyperparameter search logic is in the master, which coordinates many different Trials. Each trial
runs a train-validate-report loop:

.. table::

+----------+--------------------------------------------------------------------------+
| Train | Train until a point chosen by the hyperparameter search algorithm and |
| | obtained via the Core API. The length of training is absolute, so you |
| | have to keep track of how much you have already trained to know how much |
| | more to train. |
+----------+--------------------------------------------------------------------------+
| Validate | Validate your model to obtain the metric you configured in the |
| | ``searcher.metric`` field of your experiment config. |
+----------+--------------------------------------------------------------------------+
| Report | Use the Core API to report results to the master. |
+----------+--------------------------------------------------------------------------+
With the Core API you can run advanced hyperparameter searches with any training loop. The
hyperparameter search logic is in the master, which can create trials and can decide to preempt them
if they are underpeforming.

To perform a hyperparameter search, we'll update our script to define the hyperparameter search
settings we want to use for our experiment. More specifically, we'll need to define the following
settings in our experiment configuration file:

- ``name:`` ``adaptive_asha`` (name of our searcher. For all options, visit :ref:`search-methods`.
- ``metric``: ``test_loss``
- ``smaller_is_better``: ``True`` (This is equivalent to minimization vs. maximization of
objective.)
- ``max_trials``: 500 (This is the maximum number of trials the searcher should run.)
- ``time_metric``: ``epochs`` (This is the name of the "time" metric which we report in validation
- ``name: adaptive_asha`` (name of our searcher. For all options, visit :ref:`search-methods`).
- ``metric: test_loss``
- ``smaller_is_better: true`` (This is equivalent to minimization vs. maximization of objective.)
- ``max_trials: 50`` (This is the maximum number of trials the searcher should run.)
- ``time_metric: epochs`` (This is the name of the "time" metric which we report in validation
metrics).
- ``max_time``: 20 (The max number of epochs a trial will report. For more information, visit
- ``max_time: 20`` (The max number of epochs a trial will report. For more information, visit
Adaptive ASHA in the :ref:`Experiment Configuration Reference <experiment-configuration>`.

In addition, we also need to define the hyperparameters themselves. Adaptive ASHA will pick values
between the ``minval`` and ``maxval`` for each hyperparameter for each trial.

.. note::

To see early stopping in action, try setting ``max_trials`` to over 500 and playing around with
the hyperparameter search values.

In this step, we’ll run our experiment using the ``model_def_adaptive.py`` script and its
accompanying ``adaptive.yaml`` experiment configuration file.

Expand Down Expand Up @@ -375,6 +355,15 @@ hardcoded values:
:end-before: # Docs snippet end: per trial basis
:dedent:

Lastly, to comply with the requirements of the ASHA search, we must report an ``epochs`` metric with
our validation metrics, since we set ``time_metric: epochs`` in our searcher:

.. literalinclude:: ../../../../examples/tutorials/core_api_pytorch_mnist/model_def_adaptive.py
:language: python
:start-after: # Docs snippet start: report epochs
:end-before: # Docs snippet end: report epochs
:dedent:

Step 4.1: Run the Experiment
============================

Expand Down
2 changes: 2 additions & 0 deletions docs/model-dev-guide/create-experiment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ Example Python script command:
script.py [args...]
.. _predefined-launchers:

**********************
Predefined Launchers
**********************
Expand Down
Loading

0 comments on commit 21b0256

Please sign in to comment.