You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The team has implemented tests that quantify the reliability and fault tolerance of the example Prefect workflow
The team has simulated failures in the operation of the example Prefect workflow to demonstrate the usefulness of the tests
Key Decision Points
How to measure reliability and fault tolerance?
Appropriate resolution of measurement quantities?
What are realistic simulations of failure?
Artifacts
Initial definitions of reliability and fault tolerance against which to implement tests for monitoring.
Test system and associated CI capabilities
Passing tests that function as the basis for a monitoring system of workflows based on Prefect.
Success Criteria
There are established definitions and initial measurements that quantify the reliability and fault tolerance of workflows based on Prefect.
Potential Challenges
Commonly used monitoring signals (latency, traffic, errors, saturation, time-to-recovery) might be difficult to quantify using workflows that only mock behavior of domain applications, i.e., sleep functions on a instead of actual workloads.
Appropriate measurement resolution still undefined without knowing the details of integration with other services (such as user interfaces, resource pools, user demand).
Without a well understood model of real incidents that might occur in a future working system, simulated failures might provide unrealistic constraints on the development of example workflows.
The text was updated successfully, but these errors were encountered:
Objective
Define, measure, and improve the reliability and fault tolerance of an example workflow based on Prefect.
Requirements
Prerequisite
Note: This is essentially a sub-task of #8
Definition of Done
Key Decision Points
Artifacts
Success Criteria
There are established definitions and initial measurements that quantify the reliability and fault tolerance of workflows based on Prefect.
Potential Challenges
The text was updated successfully, but these errors were encountered: