From d8c7bd24ac956956f00cfbdd50f66fdd00350139 Mon Sep 17 00:00:00 2001 From: Tara Date: Tue, 12 Sep 2023 08:16:56 -0500 Subject: [PATCH] docs: Clarify meaning of trial api (#7818) --- docs/tutorials/index.rst | 6 +- docs/tutorials/pytorch-mnist-local-qs.rst | 9 ++- docs/tutorials/pytorch-mnist-tutorial.rst | 78 +++++++++++++++-------- 3 files changed, 59 insertions(+), 34 deletions(-) diff --git a/docs/tutorials/index.rst b/docs/tutorials/index.rst index 960a13f8700..476e276e176 100644 --- a/docs/tutorials/index.rst +++ b/docs/tutorials/index.rst @@ -12,9 +12,9 @@ To get started with your first experiment, visit the :ref:`Quickstart for Model Developers `. -****************************** - Get Started with a Trial API -****************************** +******************************************************* + Get Started with a :ref:`Trial API ` +******************************************************* +---------------------------------+--------------------------------------------------------------+ | Title | Description | diff --git a/docs/tutorials/pytorch-mnist-local-qs.rst b/docs/tutorials/pytorch-mnist-local-qs.rst index 1c22eb9cabc..d9fdc26e5e1 100644 --- a/docs/tutorials/pytorch-mnist-local-qs.rst +++ b/docs/tutorials/pytorch-mnist-local-qs.rst @@ -100,6 +100,9 @@ This is the cluster address for your local training environment. In four simple steps, we've successfully configured our training environment in Determined to start training the PyTorch MNIST example. -In this article, we learned how to run an experiment on a local, single CPU or GPU. To learn how to -change your configuration settings, including how to run a distributed training job on multiple -GPUs, visit the :ref:`Quickstart for Model Developers `. +In this article, we learned how to run an experiment on a local, single CPU or GPU. If you want to +learn more details about the basic structure shown in the trial class, visit the +:ref:`pytorch-mnist-tutorial`. + +To learn how to change your configuration settings, including how to run a distributed training job +on multiple GPUs, visit the :ref:`Quickstart for Model Developers `. diff --git a/docs/tutorials/pytorch-mnist-tutorial.rst b/docs/tutorials/pytorch-mnist-tutorial.rst index 4e89621eaf4..c63c1326c79 100644 --- a/docs/tutorials/pytorch-mnist-tutorial.rst +++ b/docs/tutorials/pytorch-mnist-tutorial.rst @@ -8,15 +8,19 @@ :description: Using a simple image classification model for the MNIST dataset, you'll Learn how to port an existing PyTorch model to Determined. :keywords: PyTorch API,MNIST,model developer,quickstart -This tutorial describes how to port an existing PyTorch model to Determined. We will port a simple -image classification model for the MNIST dataset. This tutorial is based on the official `PyTorch -MNIST example `_. +In this tutorial, you'll learn how to port an existing PyTorch model to Determined. We will port a +simple image classification model for the MNIST dataset. This tutorial is based on the official +`PyTorch MNIST example `_. + +********************* + About Model Porting +********************* To use a PyTorch model in Determined, you need to port the model to Determined's API. For most models, this porting process is straightforward, and once the model has been ported, all of the -features of Determined will then be available: for example, you can do :ref:`distributed training -` or :ref:`hyperparameter search ` without changing your -model code, and Determined will store and visualize your model metrics automatically. +features of Determined will then be available. For example, you can perform :ref:`distributed +training ` and :ref:`hyperparameter search ` without +changing your model code. Determined will store and visualize your model metrics automatically. When training a PyTorch model, Determined provides a built-in training loop that feeds each batch of training data into your ``train_batch`` function, which should perform the forward pass, @@ -24,45 +28,56 @@ backpropagation, and compute training metrics for the batch. Determined also han log management, and device initialization. To plug your model code into the Determined training loop, you define methods to perform the following tasks: -- initialize the models, optimizers, and LR schedulers -- define the training function for forward and backward passes -- define the evaluation function to compute the loss and other metrics on the validation data set -- load the training data set -- load the validation data set +- Initialize the models, optimizers, and LR schedulers. +- Define the training function for forward and backward passes. +- Define the evaluation function to compute the loss and other metrics on the validation data set. +- Load the training data set. +- Load the validation data set. The Determined training loop will then invoke these functions automatically. These methods should be organized into a **trial class**, which is a user-defined Python class that inherits from :class:`determined.pytorch.PyTorchTrial`. The following sections walk through how to write your first trial class and then how to run a training job with Determined. -The complete code for this tutorial can be downloaded here: :download:`mnist_pytorch.tgz -`. After downloading this file, open a terminal window, extract the -file, and ``cd`` into the ``mnist_pytorch`` directory: - -.. code:: - - tar xzvf mnist_pytorch.tgz - cd mnist_pytorch - -We suggest you follow along with the code as you read through this tutorial. - *************** Prerequisites *************** - Access to a Determined cluster. If you have not yet installed Determined, refer to the - :ref:`installation-guide`. + :ref:`installation instructions `. - Access to the Determined CLI on your local machine. See :ref:`the installation instructions ` if you do not already have it installed. After installing the CLI, configure it to connect to your Determined cluster by setting the ``DET_MASTER`` environment variable to the hostname or IP address where Determined is running. -******************************** - Build a ``PyTorchTrial`` Class -******************************** +.. note:: + + For basic instructions on how to start a Determined cluster locally and run an experiment using + the ``mnist_pytorch`` example, visit :ref:`Run Your First Experiment in Determined + `. + +**************************** + Getting the Tutorial Files +**************************** -Here is what the skeleton of our trial class looks like: +- Download the complete code for this tutorial from :download:`mnist_pytorch.tgz + `. +- After downloading the file, open a terminal window, extract the file, and ``cd`` into the + ``mnist_pytorch`` directory: + +.. code:: + + tar xzvf mnist_pytorch.tgz + cd mnist_pytorch + +- Follow along with the code as you complete the tutorial. + +************************************* + Creating the ``PyTorchTrial`` Class +************************************* + +Outlined below is a basic structure for our trial class: .. code:: python @@ -94,7 +109,7 @@ Here is what the skeleton of our trial class looks like: # This should return a determined.pytorch.Dataset. pass -We now discuss how to implement each of these methods in more detail. +Let's dive deeper into the implementation of each of these methods. Initialization ============== @@ -311,3 +326,10 @@ in your web browser. Once you are on the Determined landing page, you can find your experiment using the experiment's ID (``xxx`` in the example above) or description. + +************ + Next Steps +************ + +Now that you are familiar with porting model code to Determined, you can keep working with the +PyTorch MNIST model and learn how to :ref:`get up and running with the Core API `.