How do I know if a tasklist is overloaded #4403
-
How can I realized the tasklist backlog much more task, while the workers polls task from it too slowly? Are there some metrics can helping this? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 15 replies
-
There is two metric of scheduled to start latency for both decision and activity task: https://github.com/uber-go/cadence-client/blob/b09692f6838f3c45fe02fde689787c87d925bdaf/internal/common/metrics/constants.go#L46 if you are sure that workers are enough and idle, than the only possibility of high latency is tasklist being overloaded |
Beta Was this translation helpful? Give feedback.
-
In this pieces of cadence client code func (atp *activityTaskPoller) pollWithMetrics(ctx context.Context,
pollFunc func(ctx context.Context) (*s.PollForActivityTaskResponse, time.Time, error)) (interface{}, error) {
response, startTime, err := pollFunc(ctx)
if err != nil {
return nil, err
}
if response == nil || len(response.TaskToken) == 0 {
return &activityTask{}, nil
}
workflowType := response.WorkflowType.GetName()
activityType := response.ActivityType.GetName()
metricsScope := getMetricsScopeForActivity(atp.metricsScope, workflowType, activityType)
metricsScope.Counter(metrics.ActivityPollSucceedCounter).Inc(1)
metricsScope.Timer(metrics.ActivityPollLatency).Record(time.Now().Sub(startTime))
scheduledToStartLatency := time.Duration(response.GetStartedTimestamp() - response.GetScheduledTimestampOfThisAttempt())
metricsScope.Timer(metrics.ActivityScheduledToStartLatency).Record(scheduledToStartLatency)
return &activityTask{task: response, pollStartTime: startTime}, nil
} the metrics scope is atp's(*activityTaskPoller) scope, is this atp's scope derived from the worker.Options's metrics scope? workerOptions := worker.Options{
MetricsScope: h.WorkerMetricScope,
Logger: h.Logger,
MaxConcurrentActivityTaskPollers: 10,
MaxConcurrentDecisionTaskPollers: 10,
}
h.StartWorkers(domain, taskListName, workerOptions) |
Beta Was this translation helpful? Give feedback.
-
I had catch the worker's metrics. The metrics 'cadence-activity-poll-transient-failed' and 'cadence-decision-poll-transient-failed' counter is very high? Why the poll always failed? |
Beta Was this translation helpful? Give feedback.
-
another scene is I have this stack trace of the workflow that Timeout START_TO_CLOSE in my cadence-web UI.
the code in cadence-worker/logic/cronWorkflow.go:211 is only a activity execute: err = workflow.ExecuteActivity(ctx, SendJsonToDDMQV1Activity, featureValue, params).Get(ctx, nil) why it pending in the Get funciton, I dont need the return value of this activity. |
Beta Was this translation helpful? Give feedback.
There is two metric of scheduled to start latency for both decision and activity task: https://github.com/uber-go/cadence-client/blob/b09692f6838f3c45fe02fde689787c87d925bdaf/internal/common/metrics/constants.go#L46
if you are sure that workers are enough and idle, than the only possibility of high latency is tasklist being overloaded