diff --git a/docs/reference/experiment-config-reference.rst b/docs/reference/experiment-config-reference.rst index e4b489d927d..5ba662831e6 100644 --- a/docs/reference/experiment-config-reference.rst +++ b/docs/reference/experiment-config-reference.rst @@ -308,10 +308,11 @@ Optional. Defines actions and labels in response to trial logs matching specifie language syntax). For more information about the syntax, you can visit this `RE2 reference page `__. Each log policy can have the following fields: -- ``name``: Optional. A name for the log policy. If provided, this name will be displayed as a - label in the UI when the log policy matches. +- ``name``: Required. The name of the log policy, displayed as a label in the WebUI when a log + policy match occurs. -- ``pattern``: Required. The regex pattern to match in the logs. +- ``pattern``: Optional. Defines a regex pattern to match log entries. If not specified, this + policy is disabled. - ``action``: Optional. The action to take when the pattern is matched. Actions include: @@ -336,24 +337,36 @@ Example configuration: .. code:: yaml log_policies: - - name: "ECC Error" - pattern: ".*uncorrectable ECC error encountered.*" - action: - type: exclude_node - - name: "CUDA OOM" - pattern: ".*CUDA out of memory.*" - action: - type: cancel_retries - -When a log policy matches, its name (if provided) will be displayed as a label in the WebUI, -allowing for easy identification of specific issues or events during a run. These labels will appear -in both the run table and run detail views. + - name: ECC Error + pattern: ".*uncorrectable ECC error encountered.*" + action: exclude_node + - name: CUDA OOM + pattern: ".*CUDA out of memory.*" + action: cancel_retries + +When a log policy matches, its name appears as a label in the WebUI, making it easy to identify +specific issues during a run. These labels are shown in both the run table and run detail views. These settings may also be specified at the cluster or resource pool level through task container defaults. -To find out more about log management features like **Log Search** and **Log Signal**, visit -:ref:`Log Management `. +Default policies: + +.. code:: yaml + + log_policies: + - name: CUDA OOM + pattern: ".*CUDA out of memory.*" + - name: ECC Error + pattern: ".*uncorrectable ECC error encountered.*" + +To disable showing labels from the default policies: + +.. code:: yaml + + log_policies: + - name: CUDA OOM + - name: ECC Error .. _log-retention-days: diff --git a/docs/release-notes/log-search-improvement.rst b/docs/release-notes/log-search-improvement.rst index c24773c49ec..5ce8d46fff1 100644 --- a/docs/release-notes/log-search-improvement.rst +++ b/docs/release-notes/log-search-improvement.rst @@ -6,4 +6,4 @@ search result will take users directly to the relevant position in the log, allowing them to easily view logs both before and after the matched entry. Additionally, add support for regex-based searches, providing more flexible log filtering. For more details, refer to - :ref:`log_policies `. + :ref:`WebUI `. diff --git a/docs/release-notes/log-signal.rst b/docs/release-notes/log-signal.rst new file mode 100644 index 00000000000..743b0c6c56b --- /dev/null +++ b/docs/release-notes/log-signal.rst @@ -0,0 +1,10 @@ +:orphan: + +**New Features** + +- Experiments: Add a ``name`` field to ``log_policies``. When a log policy matches, its name shows + as a label in the WebUI, making it easy to spot specific issues during a run. Labels appear in + both the run table and run detail views. + + In addition, there is a new format: ``name`` is required, and ``action`` is now a plain string. + For more details, refer to :ref:`log_policies `. diff --git a/docs/tools/webui-if.rst b/docs/tools/webui-if.rst index 733180a123e..5ab5187cc33 100644 --- a/docs/tools/webui-if.rst +++ b/docs/tools/webui-if.rst @@ -241,3 +241,17 @@ Clear the message with the following command: .. code:: bash det master cluster-message clear + +**************************** + Viewing Log Search Results +**************************** + +To perform a log search: + +#. Navigate to your run in the WebUI. +#. In the Logs tab, start typing in the search box to open the search pane. +#. To use regex search, click the "Regex" checkbox in the search pane. +#. Click on a search result to view it in context, with logs before and after visible. +#. Scroll up and down to fetch new logs. + +Note: Search results are not auto-updating. You may need to refresh to see new logs. diff --git a/docs/tutorials/_index.rst b/docs/tutorials/_index.rst index 724736f35c6..ab695aecfce 100644 --- a/docs/tutorials/_index.rst +++ b/docs/tutorials/_index.rst @@ -46,7 +46,6 @@ Examples let you build off of an existing model that already runs on Determined. :hidden: Quickstart for Model Developers - Managing Logs and Log Policies Get Started with Detached Mode Viewing Epoch-Based Metrics in the WebUI Using Pachyderm to Create a Batch Inferencing Pipeline diff --git a/docs/tutorials/log-management.rst b/docs/tutorials/log-management.rst deleted file mode 100644 index 3eb95cfea69..00000000000 --- a/docs/tutorials/log-management.rst +++ /dev/null @@ -1,52 +0,0 @@ -.. _log-management: - -################ - Log Management -################ - -This guide covers two log management features: Log Search and Log Signal. - -************ - Log Search -************ - -To perform a log search: - -#. Navigate to your run in the WebUI. -#. In the Logs tab, start typing in the search box to open the search pane. -#. To use regex search, click the "Regex" checkbox in the search pane. -#. Click on a search result to view it in context, with logs before and after visible. -#. Scroll up and down to fetch new logs. - -Note: Search results are not auto-updating. You may need to refresh to see new logs. - -************ - Log Signal -************ - -Log Signal allows you to configure log policies in the master configuration to display labels in the -UI when specific patterns are matched in the logs. - -To set up a log policy: - -#. In the master configuration file, under ``task_container_defaults > log_policies``, define your - log policies. -#. Each policy can have a ``name``, ``pattern``, and ``action``. -#. When a log matching the pattern is encountered, the ``name`` will be displayed as a label in the - run table and run detail views. - -Example configuration: - -.. code:: yaml - - log_policies: - - name: "CUDA OOM" - pattern: ".*CUDA out of memory.*" - action: - type: cancel_retries - -This will display a "CUDA OOM" label in the UI when a CUDA out of memory error is encountered in the -logs. - -For more detailed information on configuring log policies, refer to the :ref:`experiment -configuration reference `.