Merge branch 'main' of github.com:determined-ai/determined into main

determined-ai · Sep 12, 2023 · bb86bde · bb86bde
2 parents a387c2b + d8c7bd2
commit bb86bde
Show file tree

Hide file tree

Showing 60 changed files with 1,023 additions and 667 deletions.
diff --git a/.bumpversion.cfg b/.bumpversion.cfg
@@ -1,5 +1,5 @@
 [bumpversion]
-current_version = 0.25.1-dev0
+current_version = 0.25.2-dev0
 commit = true
 tag = true
 tag_name = {new_version}

diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -25,7 +25,7 @@ executors:
 parameters:
   det-version:
     type: string
-    default: 0.25.1-dev0
+    default: 0.25.2-dev0
   docker-image:
     type: string
     default: determinedai/cimg-base:latest

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.25.1-dev0
+0.25.2-dev0
diff --git a/docs/_static/version-switcher/versions.json b/docs/_static/version-switcher/versions.json
@@ -1,6 +1,6 @@
 [
     {
-        "version": "0.25.1-dev0",
+        "version": "0.25.2-dev0",
         "url": "https://docs.determined.ai/latest/"
     },
     {

diff --git a/docs/release-notes.rst b/docs/release-notes.rst
@@ -10,6 +10,21 @@
  Version 0.25
 **************
 
+Version 0.25.1
+==============
+
+**Release Date:** September 11, 2023
+
+**Breaking Changes**
+
+-  Fluent Bit is no longer used for log shipping and configs associated with Fluent Bit are now no
+   longer in use. Fluent Bit has been replaced with an internal log shipper (the same one that is
+   used for Slurm).
+
+**Bug Fixes**
+
+-  Reduce the time before seeing the first metrics of a new experiment.
+
 Version 0.25.0
 ==============
 

diff --git a/docs/tutorials/index.rst b/docs/tutorials/index.rst
@@ -12,9 +12,9 @@
 To get started with your first experiment, visit the :ref:`Quickstart for Model Developers
 <qs-mdldev>`.
 
-******************************
- Get Started with a Trial API
-******************************
+*******************************************************
+ Get Started with a :ref:`Trial API <high-level-apis>`
+*******************************************************
 
 +---------------------------------+--------------------------------------------------------------+
 | Title                           | Description                                                  |

diff --git a/docs/tutorials/pytorch-mnist-local-qs.rst b/docs/tutorials/pytorch-mnist-local-qs.rst
@@ -100,6 +100,9 @@ This is the cluster address for your local training environment.
 In four simple steps, we've successfully configured our training environment in Determined to start
 training the PyTorch MNIST example.
 
-In this article, we learned how to run an experiment on a local, single CPU or GPU. To learn how to
-change your configuration settings, including how to run a distributed training job on multiple
-GPUs, visit the :ref:`Quickstart for Model Developers <qs-mdldev>`.
+In this article, we learned how to run an experiment on a local, single CPU or GPU. If you want to
+learn more details about the basic structure shown in the trial class, visit the
+:ref:`pytorch-mnist-tutorial`.
+
+To learn how to change your configuration settings, including how to run a distributed training job
+on multiple GPUs, visit the :ref:`Quickstart for Model Developers <qs-mdldev>`.
diff --git a/docs/tutorials/pytorch-mnist-tutorial.rst b/docs/tutorials/pytorch-mnist-tutorial.rst
@@ -8,61 +8,76 @@
    :description: Using a simple image classification model for the MNIST dataset, you'll Learn how to port an existing PyTorch model to Determined.
    :keywords: PyTorch API,MNIST,model developer,quickstart
 
-This tutorial describes how to port an existing PyTorch model to Determined. We will port a simple
-image classification model for the MNIST dataset. This tutorial is based on the official `PyTorch
-MNIST example <https://github.com/PyTorch/examples/blob/master/mnist/main.py>`_.
+In this tutorial, you'll learn how to port an existing PyTorch model to Determined. We will port a
+simple image classification model for the MNIST dataset. This tutorial is based on the official
+`PyTorch MNIST example <https://github.com/PyTorch/examples/blob/master/mnist/main.py>`_.
+
+*********************
+ About Model Porting
+*********************
 
 To use a PyTorch model in Determined, you need to port the model to Determined's API. For most
 models, this porting process is straightforward, and once the model has been ported, all of the
-features of Determined will then be available: for example, you can do :ref:`distributed training
-<multi-gpu-training>` or :ref:`hyperparameter search <hyperparameter-tuning>` without changing your
-model code, and Determined will store and visualize your model metrics automatically.
+features of Determined will then be available. For example, you can perform :ref:`distributed
+training <multi-gpu-training>` and :ref:`hyperparameter search <hyperparameter-tuning>` without
+changing your model code. Determined will store and visualize your model metrics automatically.
 
 When training a PyTorch model, Determined provides a built-in training loop that feeds each batch of
 training data into your ``train_batch`` function, which should perform the forward pass,
 backpropagation, and compute training metrics for the batch. Determined also handles checkpointing,
 log management, and device initialization. To plug your model code into the Determined training
 loop, you define methods to perform the following tasks:
 
--  initialize the models, optimizers, and LR schedulers
--  define the training function for forward and backward passes
--  define the evaluation function to compute the loss and other metrics on the validation data set
--  load the training data set
--  load the validation data set
+-  Initialize the models, optimizers, and LR schedulers.
+-  Define the training function for forward and backward passes.
+-  Define the evaluation function to compute the loss and other metrics on the validation data set.
+-  Load the training data set.
+-  Load the validation data set.
 
 The Determined training loop will then invoke these functions automatically. These methods should be
 organized into a **trial class**, which is a user-defined Python class that inherits from
 :class:`determined.pytorch.PyTorchTrial`. The following sections walk through how to write your
 first trial class and then how to run a training job with Determined.
 
-The complete code for this tutorial can be downloaded here: :download:`mnist_pytorch.tgz
-</examples/mnist_pytorch.tgz>`. After downloading this file, open a terminal window, extract the
-file, and ``cd`` into the ``mnist_pytorch`` directory:
-
-.. code::
-
-   tar xzvf mnist_pytorch.tgz
-   cd mnist_pytorch
-
-We suggest you follow along with the code as you read through this tutorial.
-
 ***************
  Prerequisites
 ***************
 
 -  Access to a Determined cluster. If you have not yet installed Determined, refer to the
-   :ref:`installation-guide`.
+   :ref:`installation instructions <installation-guide>`.
 
 -  Access to the Determined CLI on your local machine. See :ref:`the installation instructions
    <install-cli>` if you do not already have it installed. After installing the CLI, configure it to
    connect to your Determined cluster by setting the ``DET_MASTER`` environment variable to the
    hostname or IP address where Determined is running.
 
-********************************
- Build a ``PyTorchTrial`` Class
-********************************
+.. note::
+
+   For basic instructions on how to start a Determined cluster locally and run an experiment using
+   the ``mnist_pytorch`` example, visit :ref:`Run Your First Experiment in Determined
+   <pytorch_mnist_quickstart>`.
+
+****************************
+ Getting the Tutorial Files
+****************************
 
-Here is what the skeleton of our trial class looks like:
+-  Download the complete code for this tutorial from :download:`mnist_pytorch.tgz
+   </examples/mnist_pytorch.tgz>`.
+-  After downloading the file, open a terminal window, extract the file, and ``cd`` into the
+   ``mnist_pytorch`` directory:
+
+.. code::
+
+   tar xzvf mnist_pytorch.tgz
+   cd mnist_pytorch
+
+-  Follow along with the code as you complete the tutorial.
+
+*************************************
+ Creating the ``PyTorchTrial`` Class
+*************************************
+
+Outlined below is a basic structure for our trial class:
 
 .. code:: python
 
@@ -94,7 +109,7 @@ Here is what the skeleton of our trial class looks like:
            # This should return a determined.pytorch.Dataset.
            pass
 
-We now discuss how to implement each of these methods in more detail.
+Let's dive deeper into the implementation of each of these methods.
 
 Initialization
 ==============
@@ -311,3 +326,10 @@ in your web browser.
 
 Once you are on the Determined landing page, you can find your experiment using the experiment's ID
 (``xxx`` in the example above) or description.
+
+************
+ Next Steps
+************
+
+Now that you are familiar with porting model code to Determined, you can keep working with the
+PyTorch MNIST model and learn how to :ref:`get up and running with the Core API <api-core-ug>`.
diff --git a/harness/determined/__version__.py b/harness/determined/__version__.py
@@ -1 +1 @@
-__version__ = "0.25.1-dev0"
+__version__ = "0.25.2-dev0"
diff --git a/harness/determined/cli/agent.py b/harness/determined/cli/agent.py
@@ -156,9 +156,13 @@ def patch_agent(enabled: bool) -> Callable[[argparse.Namespace], None]:
     @authentication.required
     def patch(args: argparse.Namespace) -> None:
         check.check_false(args.all and args.agent_id)
+        action = "enable" if enabled else "disable"
 
         if not (args.all or args.agent_id):
-            raise errors.CliError("Must specify exactly on of --all or --agent-id")
+            raise errors.CliError(
+                "Please pass agent id or --all option. "
+                f"See `det agent {action} --help` for details."
+            )
 
         if args.agent_id:
             agent_ids = [args.agent_id]
@@ -169,7 +173,6 @@ def patch(args: argparse.Namespace) -> None:
         drain_mode = None if enabled else args.drain
 
         for agent_id in agent_ids:
-            action = "enable" if enabled else "disable"
             path = f"api/v1/agents/{agent_id}/{action}"
 
             payload = None

diff --git a/harness/determined/common/api/bindings.py b/harness/determined/common/api/bindings.py
diff --git a/harness/determined/deploy/aws/templates/efs.yaml b/harness/determined/deploy/aws/templates/efs.yaml
@@ -101,7 +101,7 @@ Parameters:
   Version:
     Type: String
     Description: Determined version or commit for master image
-    Default: 0.25.1-dev0
+    Default: 0.25.2-dev0
 
   DBPassword:
     Type: String

diff --git a/harness/determined/deploy/aws/templates/fsx.yaml b/harness/determined/deploy/aws/templates/fsx.yaml
@@ -101,7 +101,7 @@ Parameters:
   Version:
     Type: String
     Description: Determined version or commit for master image
-    Default: 0.25.1-dev0
+    Default: 0.25.2-dev0
 
   DBPassword:
     Type: String

diff --git a/harness/determined/deploy/aws/templates/govcloud.yaml b/harness/determined/deploy/aws/templates/govcloud.yaml
@@ -67,7 +67,7 @@ Parameters:
   Version:
     Type: String
     Description: Determined version or commit for master docker image
-    Default: 0.25.1-dev0
+    Default: 0.25.2-dev0
 
   DBPassword:
     Type: String

diff --git a/harness/determined/deploy/aws/templates/secure.yaml b/harness/determined/deploy/aws/templates/secure.yaml
@@ -122,7 +122,7 @@ Parameters:
   Version:
     Type: String
     Description: Determined version or commit for master image
-    Default: 0.25.1-dev0
+    Default: 0.25.2-dev0
 
   DBPassword:
     Type: String

diff --git a/harness/determined/deploy/aws/templates/simple-rds.yaml b/harness/determined/deploy/aws/templates/simple-rds.yaml
@@ -93,7 +93,7 @@ Parameters:
   Version:
     Type: String
     Description: Determined version or commit for master docker image
-    Default: 0.25.1-dev0
+    Default: 0.25.2-dev0
 
   DBPassword:
     Type: String

diff --git a/harness/determined/deploy/aws/templates/simple.yaml b/harness/determined/deploy/aws/templates/simple.yaml
@@ -93,7 +93,7 @@ Parameters:
   Version:
     Type: String
     Description: Determined version or commit for master docker image
-    Default: 0.25.1-dev0
+    Default: 0.25.2-dev0
 
   DBPassword:
     Type: String

diff --git a/harness/setup.py b/harness/setup.py
@@ -2,7 +2,7 @@
 
 setuptools.setup(
     name="determined",
-    version="0.25.1-dev0",
+    version="0.25.2-dev0",
     author="Determined AI",
     author_email="[email protected]",
     url="https://determined.ai/",

diff --git a/helm/charts/determined/Chart.yaml b/helm/charts/determined/Chart.yaml
@@ -1,12 +1,12 @@
 apiVersion: v1
 name: determined
 description: A Helm chart for Determined
-version: "0.25.1-dev0"
+version: "0.25.2-dev0"
 icon: https://github.com/determined-ai/determined/blob/master/determined-logo.png?raw=true
 home: https://github.com/determined-ai/determined.git
 
 # appVersion controls the version HPE MLDE / Determined OSS that is deployed. 
 # If using a non-release version (e.g., X.Y.Z.dev0) you will have to specify an
 # existing official release version (e.g., X.Y.Z) or specify a commit has
 # that has been publicly published (all commits from master).
-appVersion: "0.25.1-dev0"
+appVersion: "0.25.2-dev0"
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		__version__ = "0.25.1-dev0"
		__version__ = "0.25.2-dev0"