Skip to content

Commit

Permalink
auto commit: wip
Browse files Browse the repository at this point in the history
  • Loading branch information
azhou-determined committed Nov 1, 2024
1 parent 8a5d862 commit e3e4332
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 14 deletions.
39 changes: 29 additions & 10 deletions docs/reference/experiment-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -182,19 +182,20 @@ Example:

.. _scheduling-unit:

``scheduling_unit``
===================
``scheduling_unit`` (deprecated)
================================

Optional. Instructs how frequent to perform system operations, such as periodic checkpointing and
preemption, in the unit of batches. The number of records in a batch is controlled by the
:ref:`global_batch_size <config-global-batch-size>` hyperparameter. Defaults to ``100``.
preemption, in the unit of batches. This field has been deprecated and the behavior should be
configured in training code directly. Please see :ref:`apis-howto-overview` for details specific to
your training framework.

.. _config-records-per-epoch:

- Setting this value too small can increase the overhead of system operations and decrease training
throughput.
- Setting this value too large might prevent the system from reallocating resources from this
workload to another, potentially more important, workload.
- As a rule of thumb, it should be set to the number of batches that can be trained in roughly
60--180 seconds.
``records_per_epoch`` (deprecated)
==================================

Optional. The number of records in the training data set. This field has been deprecated.

.. _max-restarts:

Expand Down Expand Up @@ -319,6 +320,15 @@ While debugging, the logger will display lines highlighted in blue for easy iden
Validation Policy
*******************

.. _experiment-config-min-validation-period:

``min_validation_period`` (deprecated)
======================================

Optional. Specifies the minimum frequency at which validation should be run for each trial. This
field has been deprecated and should be specified directly in training code. Please see
:ref:`apis-howto-overview` for details specific to your training framework.

.. _experiment-config-perform-initial-validation:

``perform_initial_validation``
Expand All @@ -341,6 +351,15 @@ Determined checkpoints in the following situations:
- Prior to the searcher making a decision based on the validation of trials, ensuring consistency
in case of a failure.

.. _experiment-config-min-checkpoint-period:

``min_checkpoint_period`` (deprecated)
======================================

Optional. Specifies the minimum frequency for running checkpointing for each trial. This field has
been deprecated and should be specified directly in training code. Please see
:ref:`apis-howto-overview` for details specific to your training framework.

``checkpoint_policy``
=====================

Expand Down
9 changes: 7 additions & 2 deletions docs/release-notes/searcher-context-removal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@
and are now being removed. Users are encouraged to use a preset searcher, which can be easily
:ref:`configured <experiment-configuration_searcher>` for any experiment.

- DeepSpeed: the ``num_micro_batches_per_slot`` and ``train_micro_batch_size_per_gpu`` attributes
on ``DeepSpeedContext`` have been replaced with ``get_train_micro_batch_size_per_gpu()`` and
``get_num_micro_batches_per_slot()``.

**New Features**

- API: introduce ``keras.DeterminedCallback``, a new high-level training API for TF Keras that
Expand All @@ -30,8 +34,9 @@
- Experiment Config: the ``optimizations`` config has been deprecated. Please see :ref:`Training
APIs <apis-howto-overview>` to configure supported optimizations through training code directly.

- Experiment Config: the ``scheduling_unit`` config field has been deprecated. min
checkpoint/val/records per epoch
- Experiment Config: the ``scheduling_unit``, ``min_checkpoint_period``, and
``min_validation_period`` config fields have been deprecated. Instead, these configuration
options should be specified in training code.

- Experiment Config: the ``entrypoint`` field no longer accepts ``model_def:TrialClass`` as trial
definitions. Please invoke your training script directly (``python3 train.py``).
Expand Down
4 changes: 2 additions & 2 deletions harness/determined/pytorch/_trainer_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ class Epoch(TrainUnit):
Epoch(int) values are treated as periods, e.g. Epoch(100) will checkpoint/validate every 100
epochs.
Epoch(collections.abc.Container) values are treated as schedules, e.g. Epoch(1,5,10) will
Epoch(collections.abc.Container) values are treated as schedules, e.g. Epoch([1,5,10]) will
checkpoint/validate on epochs 1, 5, and 10.
"""

Expand All @@ -100,7 +100,7 @@ class Batch(TrainUnit):
Batch(int) values are treated as periods, e.g. Batch(100) will checkpoint/validate every 100
batches.
Batch(collections.abc.Container) values are treated as schedules, e.g. Batch(1,5,10) will
Batch(collections.abc.Container) values are treated as schedules, e.g. Batch([1,5,10]) will
checkpoint/validate on batches 1, 5, and 10.
"""

Expand Down

0 comments on commit e3e4332

Please sign in to comment.