Skip to content

Commit

Permalink
Merge branch 'main' of github.com:determined-ai/determined into main
Browse files Browse the repository at this point in the history
  • Loading branch information
EmilyBonar committed Sep 12, 2023
2 parents a387c2b + d8c7bd2 commit bb86bde
Show file tree
Hide file tree
Showing 60 changed files with 1,023 additions and 667 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.25.1-dev0
current_version = 0.25.2-dev0
commit = true
tag = true
tag_name = {new_version}
Expand Down
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ executors:
parameters:
det-version:
type: string
default: 0.25.1-dev0
default: 0.25.2-dev0
docker-image:
type: string
default: determinedai/cimg-base:latest
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.25.1-dev0
0.25.2-dev0
2 changes: 1 addition & 1 deletion docs/_static/version-switcher/versions.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[
{
"version": "0.25.1-dev0",
"version": "0.25.2-dev0",
"url": "https://docs.determined.ai/latest/"
},
{
Expand Down
15 changes: 15 additions & 0 deletions docs/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,21 @@
Version 0.25
**************

Version 0.25.1
==============

**Release Date:** September 11, 2023

**Breaking Changes**

- Fluent Bit is no longer used for log shipping and configs associated with Fluent Bit are now no
longer in use. Fluent Bit has been replaced with an internal log shipper (the same one that is
used for Slurm).

**Bug Fixes**

- Reduce the time before seeing the first metrics of a new experiment.

Version 0.25.0
==============

Expand Down
6 changes: 3 additions & 3 deletions docs/tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@
To get started with your first experiment, visit the :ref:`Quickstart for Model Developers
<qs-mdldev>`.

******************************
Get Started with a Trial API
******************************
*******************************************************
Get Started with a :ref:`Trial API <high-level-apis>`
*******************************************************

+---------------------------------+--------------------------------------------------------------+
| Title | Description |
Expand Down
9 changes: 6 additions & 3 deletions docs/tutorials/pytorch-mnist-local-qs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ This is the cluster address for your local training environment.
In four simple steps, we've successfully configured our training environment in Determined to start
training the PyTorch MNIST example.

In this article, we learned how to run an experiment on a local, single CPU or GPU. To learn how to
change your configuration settings, including how to run a distributed training job on multiple
GPUs, visit the :ref:`Quickstart for Model Developers <qs-mdldev>`.
In this article, we learned how to run an experiment on a local, single CPU or GPU. If you want to
learn more details about the basic structure shown in the trial class, visit the
:ref:`pytorch-mnist-tutorial`.

To learn how to change your configuration settings, including how to run a distributed training job
on multiple GPUs, visit the :ref:`Quickstart for Model Developers <qs-mdldev>`.
78 changes: 50 additions & 28 deletions docs/tutorials/pytorch-mnist-tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,61 +8,76 @@
:description: Using a simple image classification model for the MNIST dataset, you'll Learn how to port an existing PyTorch model to Determined.
:keywords: PyTorch API,MNIST,model developer,quickstart

This tutorial describes how to port an existing PyTorch model to Determined. We will port a simple
image classification model for the MNIST dataset. This tutorial is based on the official `PyTorch
MNIST example <https://github.com/PyTorch/examples/blob/master/mnist/main.py>`_.
In this tutorial, you'll learn how to port an existing PyTorch model to Determined. We will port a
simple image classification model for the MNIST dataset. This tutorial is based on the official
`PyTorch MNIST example <https://github.com/PyTorch/examples/blob/master/mnist/main.py>`_.

*********************
About Model Porting
*********************

To use a PyTorch model in Determined, you need to port the model to Determined's API. For most
models, this porting process is straightforward, and once the model has been ported, all of the
features of Determined will then be available: for example, you can do :ref:`distributed training
<multi-gpu-training>` or :ref:`hyperparameter search <hyperparameter-tuning>` without changing your
model code, and Determined will store and visualize your model metrics automatically.
features of Determined will then be available. For example, you can perform :ref:`distributed
training <multi-gpu-training>` and :ref:`hyperparameter search <hyperparameter-tuning>` without
changing your model code. Determined will store and visualize your model metrics automatically.

When training a PyTorch model, Determined provides a built-in training loop that feeds each batch of
training data into your ``train_batch`` function, which should perform the forward pass,
backpropagation, and compute training metrics for the batch. Determined also handles checkpointing,
log management, and device initialization. To plug your model code into the Determined training
loop, you define methods to perform the following tasks:

- initialize the models, optimizers, and LR schedulers
- define the training function for forward and backward passes
- define the evaluation function to compute the loss and other metrics on the validation data set
- load the training data set
- load the validation data set
- Initialize the models, optimizers, and LR schedulers.
- Define the training function for forward and backward passes.
- Define the evaluation function to compute the loss and other metrics on the validation data set.
- Load the training data set.
- Load the validation data set.

The Determined training loop will then invoke these functions automatically. These methods should be
organized into a **trial class**, which is a user-defined Python class that inherits from
:class:`determined.pytorch.PyTorchTrial`. The following sections walk through how to write your
first trial class and then how to run a training job with Determined.

The complete code for this tutorial can be downloaded here: :download:`mnist_pytorch.tgz
</examples/mnist_pytorch.tgz>`. After downloading this file, open a terminal window, extract the
file, and ``cd`` into the ``mnist_pytorch`` directory:

.. code::
tar xzvf mnist_pytorch.tgz
cd mnist_pytorch
We suggest you follow along with the code as you read through this tutorial.

***************
Prerequisites
***************

- Access to a Determined cluster. If you have not yet installed Determined, refer to the
:ref:`installation-guide`.
:ref:`installation instructions <installation-guide>`.

- Access to the Determined CLI on your local machine. See :ref:`the installation instructions
<install-cli>` if you do not already have it installed. After installing the CLI, configure it to
connect to your Determined cluster by setting the ``DET_MASTER`` environment variable to the
hostname or IP address where Determined is running.

********************************
Build a ``PyTorchTrial`` Class
********************************
.. note::

For basic instructions on how to start a Determined cluster locally and run an experiment using
the ``mnist_pytorch`` example, visit :ref:`Run Your First Experiment in Determined
<pytorch_mnist_quickstart>`.

****************************
Getting the Tutorial Files
****************************

Here is what the skeleton of our trial class looks like:
- Download the complete code for this tutorial from :download:`mnist_pytorch.tgz
</examples/mnist_pytorch.tgz>`.
- After downloading the file, open a terminal window, extract the file, and ``cd`` into the
``mnist_pytorch`` directory:

.. code::
tar xzvf mnist_pytorch.tgz
cd mnist_pytorch
- Follow along with the code as you complete the tutorial.

*************************************
Creating the ``PyTorchTrial`` Class
*************************************

Outlined below is a basic structure for our trial class:

.. code:: python
Expand Down Expand Up @@ -94,7 +109,7 @@ Here is what the skeleton of our trial class looks like:
# This should return a determined.pytorch.Dataset.
pass
We now discuss how to implement each of these methods in more detail.
Let's dive deeper into the implementation of each of these methods.

Initialization
==============
Expand Down Expand Up @@ -311,3 +326,10 @@ in your web browser.

Once you are on the Determined landing page, you can find your experiment using the experiment's ID
(``xxx`` in the example above) or description.

************
Next Steps
************

Now that you are familiar with porting model code to Determined, you can keep working with the
PyTorch MNIST model and learn how to :ref:`get up and running with the Core API <api-core-ug>`.
2 changes: 1 addition & 1 deletion harness/determined/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.25.1-dev0"
__version__ = "0.25.2-dev0"
7 changes: 5 additions & 2 deletions harness/determined/cli/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,9 +156,13 @@ def patch_agent(enabled: bool) -> Callable[[argparse.Namespace], None]:
@authentication.required
def patch(args: argparse.Namespace) -> None:
check.check_false(args.all and args.agent_id)
action = "enable" if enabled else "disable"

if not (args.all or args.agent_id):
raise errors.CliError("Must specify exactly on of --all or --agent-id")
raise errors.CliError(
"Please pass agent id or --all option. "
f"See `det agent {action} --help` for details."
)

if args.agent_id:
agent_ids = [args.agent_id]
Expand All @@ -169,7 +173,6 @@ def patch(args: argparse.Namespace) -> None:
drain_mode = None if enabled else args.drain

for agent_id in agent_ids:
action = "enable" if enabled else "disable"
path = f"api/v1/agents/{agent_id}/{action}"

payload = None
Expand Down
16 changes: 16 additions & 0 deletions harness/determined/common/api/bindings.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion harness/determined/deploy/aws/templates/efs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Parameters:
Version:
Type: String
Description: Determined version or commit for master image
Default: 0.25.1-dev0
Default: 0.25.2-dev0

DBPassword:
Type: String
Expand Down
2 changes: 1 addition & 1 deletion harness/determined/deploy/aws/templates/fsx.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Parameters:
Version:
Type: String
Description: Determined version or commit for master image
Default: 0.25.1-dev0
Default: 0.25.2-dev0

DBPassword:
Type: String
Expand Down
2 changes: 1 addition & 1 deletion harness/determined/deploy/aws/templates/govcloud.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Parameters:
Version:
Type: String
Description: Determined version or commit for master docker image
Default: 0.25.1-dev0
Default: 0.25.2-dev0

DBPassword:
Type: String
Expand Down
2 changes: 1 addition & 1 deletion harness/determined/deploy/aws/templates/secure.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ Parameters:
Version:
Type: String
Description: Determined version or commit for master image
Default: 0.25.1-dev0
Default: 0.25.2-dev0

DBPassword:
Type: String
Expand Down
2 changes: 1 addition & 1 deletion harness/determined/deploy/aws/templates/simple-rds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Parameters:
Version:
Type: String
Description: Determined version or commit for master docker image
Default: 0.25.1-dev0
Default: 0.25.2-dev0

DBPassword:
Type: String
Expand Down
2 changes: 1 addition & 1 deletion harness/determined/deploy/aws/templates/simple.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Parameters:
Version:
Type: String
Description: Determined version or commit for master docker image
Default: 0.25.1-dev0
Default: 0.25.2-dev0

DBPassword:
Type: String
Expand Down
2 changes: 1 addition & 1 deletion harness/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

setuptools.setup(
name="determined",
version="0.25.1-dev0",
version="0.25.2-dev0",
author="Determined AI",
author_email="[email protected]",
url="https://determined.ai/",
Expand Down
4 changes: 2 additions & 2 deletions helm/charts/determined/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
apiVersion: v1
name: determined
description: A Helm chart for Determined
version: "0.25.1-dev0"
version: "0.25.2-dev0"
icon: https://github.com/determined-ai/determined/blob/master/determined-logo.png?raw=true
home: https://github.com/determined-ai/determined.git

# appVersion controls the version HPE MLDE / Determined OSS that is deployed.
# If using a non-release version (e.g., X.Y.Z.dev0) you will have to specify an
# existing official release version (e.g., X.Y.Z) or specify a commit has
# that has been publicly published (all commits from master).
appVersion: "0.25.1-dev0"
appVersion: "0.25.2-dev0"
Loading

0 comments on commit bb86bde

Please sign in to comment.