Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Pipeline error alerting and progress monitoring #1336

Closed
2 tasks done
efajardo-nv opened this issue Nov 2, 2023 · 1 comment · Fixed by #1463
Closed
2 tasks done

[FEA]: Pipeline error alerting and progress monitoring #1336

efajardo-nv opened this issue Nov 2, 2023 · 1 comment · Fixed by #1463
Assignees
Labels
dfp [Workflow] Related to the Digital Fingerprinting (DFP) workflow feature request New feature or request

Comments

@efajardo-nv
Copy link
Contributor

efajardo-nv commented Nov 2, 2023

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

High

Please provide a clear description of problem this feature solves

We have a long running Morpheus DFP pipeline which we run in a SSH session to our DGX host. We currently use screen so that we can disconnect from the host and later reattach to the screen to monitor progress and check for errors/failures. We are looking for a more practical, real-time way of being alerted of errors and multi-user access to progress monitoring.

Describe your ideal solution

An example of how to use existing functionality to implement this. If this functionality does not already exist, request necessary hooks/stages and example/documentation of how to set it up in our environment.

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@efajardo-nv efajardo-nv added the feature request New feature or request label Nov 2, 2023
@mdemoret-nv mdemoret-nv added the dfp [Workflow] Related to the Digital Fingerprinting (DFP) workflow label Dec 13, 2023
@mdemoret-nv
Copy link
Contributor

As discussed with @efajardo-nv, the best option here will be to create a logging handler which can tie into the existing morpheus logger. For example, if you wanted to report errors and warnings to Graphana, you could use this logger: https://pypi.org/project/python-logging-loki/.

import logging
import logging_loki

# Configure the morpheus logger first
configure_logging()

handler = logging_loki.LokiHandler(
    url="https://my-loki-instance/loki/api/v1/push", 
    tags={"application": "my-app"},
    auth=("username", "password"),
    version="1",
)

# Get the morpheus root logger
logger = logging.getLogger("morpheus")

# Add the loki handler to the morpheus logger
logger.addHandler(handler)

# At any point in the morpheus application, you can call something like the following which will be reported to Graphana
logger.error(
    "Something happened", 
    extra={"tags": {"service": "my-service"}},
)

@efajardo-nv efajardo-nv self-assigned this Dec 13, 2023
@efajardo-nv efajardo-nv moved this from Todo to In Progress in Morpheus Boards Dec 14, 2023
@jarmak-nv jarmak-nv moved this from In Progress to Review - Ready for Review in Morpheus Boards Jan 16, 2024
rapids-bot bot pushed a commit that referenced this issue Feb 15, 2024
+ Add Loki service to production DFP docker-compose
+ Add Loki data source to Grafana
+ Add `DFP Logs` dashboard to Grafana
+ Pipeline run script that uses Loki logging handler
+ Update README for Grafana DFP Example
+ Add instructions for setting up error alerting in Grafana
+ Update Morpheus logger to accept additional handlers so that we can add the Loki handler.

Closes #1336

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Eli Fajardo (https://github.com/efajardo-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #1463
@github-project-automation github-project-automation bot moved this from Review - Ready for Review to Done in Morpheus Boards Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dfp [Workflow] Related to the Digital Fingerprinting (DFP) workflow feature request New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants