🐛 [BUG] - use built-in metric to create Latency SLO On Dynatrace #342

GeoffroyLatourDK · 2023-06-07T12:49:18Z

SLO Generator Version

v2.3.4

Python Version

3.10.11

What happened?

in the documentation, we're shown using the threshold method with an ext: metric from a OneAgent or ActiveGate extension.

ext:app.request_latency
is this mandatory, or can we use built-in metrics like those ones ?

builtin:service.response.client
builtin:service.keyRequest.response.time

it can be great to add a kind of list of metric that we can use in the documentation :)

What did you expect?

expected to have a valid result using those two builtin metrics
builtin:service.response.client
builtin:service.keyRequest.response.time

Screenshots

![DESCRIPTION](LINK.png)

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

lvaylet · 2023-06-09T13:41:00Z

Hi @GeoffroyLatourDK, thanks for reporting this behavior. Have you actually tried using these built-in metrics? If so, could you share the output and error message(s). Ideally I'd like to reproduce the issue.

GeoffroyLatourDK · 2023-06-09T14:25:20Z

Hello @lvaylet,
As you can see on the screenshot below, I don't have any output errors, but a miscalculation in the number of good and bad events.

In fact, it's more the number of events that doesn't seem to correspond. On the screenshot below,

I'm using the builtin:service.errors.client.successCount metric, which allows me to calculate the number of successful calls. There's a big difference between the 6k83 calls on one side and the 60 calls on the other.

lvaylet · 2023-06-14T13:41:33Z

Can you share your SLO definition, either as YAML or JSON?

GeoffroyLatourDK · 2023-06-14T14:03:34Z

Hello @lvaylet of course !
I just had to annonymize the names but the structure and metrics used remain identical.

apiVersion: sre.google.com/v2
kind: ServiceLevelObjective
metadata:
  name:  dummy name
  labels:
    service_name:  dummy name
    feature_name:  dummy name
    slo_name:  dummy name
spec:
  description: dummy name
  backend: dynatrace/prod
  method: threshold
  service_level_indicator:
    query_valid:
      metric_selector: builtin:service.response.time
      entity_selector: type("service"),entityName("Dummy service")
    threshold: 200000.000         # us
  goal: 0.95
  frequency: '*/5 * * * *'

lvaylet · 2023-06-14T14:14:13Z

Thanks @GeoffroyLatourDK. Then can you also enable debug mode and share the output? For example by setting the DEBUG environment variable to 1 before calling slo-generator compute ...:

$ DEBUG=1 slo-generator compute -f <SLO_CONFIG_PATH> -c <SHARED_CONFIG_PATH>
[...]

In the mean time, I am trying my best to get my hands on a Dynatrace environment.

lvaylet · 2023-06-14T14:44:51Z

Looking at your SLO definition, can you also share what the frequency: '*/5 * * * *' field on the last line is supposed to do?

GeoffroyLatourDK · 2023-06-20T09:24:22Z

Hello @lvaylet,
the frequency: '*/5 * * * * was a miss copy paste from my side sorry ^^"
in the debug file you will find the result of the compute while on debug mode :)
debug.txt

thank for your time and help

GeoffroyLatourDK · 2023-07-26T14:57:34Z

Hello @lvaylet did you had some time to investigate on this issue ?

lvaylet · 2023-08-16T13:16:57Z

Hi @GeoffroyLatourDK. Apologies for the late reply. I was on vacation and off the grid.

I do not see anything suspicious with your SLO definition. This being said, I am surprised by the huge difference between the expected (6k83) and actual (60) values. That is two orders of magnitude! Are we really looking at the same metric? With the same filters (or absence of filters)? Over the same duration? Debug mode lets us check the actual requests to the Dynatrace API. For example on lines 67, 68 and 69 of debug.txt:

slo_generator.backends.dynatrace - DEBUG - Running "get" request to https://gwn38670.live.dynatrace.com/api/v2/metrics/query?from=1687249175000&end=1687252775000&metricSelector=builtin:service.response.time&entitySelector=type("service"),entityName("Catalina/localhost (/order)")&aggregation=SUM&includeData=True&Api-Token=   ...
urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): gwn38670.live.dynatrace.com:443
urllib3.connectionpool - DEBUG - https://gwn38670.live.dynatrace.com:443 "GET /api/v2/metrics/query?from=1687249175000&end=1687252775000&metricSelector=builtin:service.response.time&entitySelector=type(%22service%22),entityName(%22Catalina/localhost%20(/order)%22)&aggregation=SUM&includeData=True&Api-Token=   HTTP/1.1" 200 1781

Have you tried running these queries in the Dynatrace UI to confirm you get the same values? Have you also tried setting the Aggregation parameter manually to Sum in the Data explorer UI (set to Auto in your screenshot)? Finally, have you tried activating the Advanced mode (with the toggle switch at the top right) to get the equivalent query?

lvaylet · 2023-08-16T15:13:01Z

On an unrelated topic, I just noticed this performance warning at line 324 in the debug output:

             'warnings': ['The used `entityName` clause may severely degrade '
                          'the performance of your query. Please consider '
                          'using any of the following to improve query '
                          'performance: `entityName.in`, `entityName.equals`, '
                          '`entityName.startsWith`. If you need to check for '
                          'containment, please use `entityName.contains`.']}],

Most probably no consequence on the output but worth considering anyway.

lvaylet · 2023-08-17T07:13:07Z

Hi again, I also noticed that your SLO definition sets spec.service_level_indicator.query_valid.entity_selector to type("service"),entityName("Dummy service") while your screenshot shows these values at two different places: Split by and Filter by. I am not an expert at Dynatrace Query Language (DQL). Does that translate to the same query at the end of the day?

GeoffroyLatourDK · 2023-08-28T14:30:17Z

Hello, as an update i've check a bit more parameter via the UI and found the Fold transformation parameter and when I change it from auto to count i have the same result as the SLO Generator output but for the moment i don't know how to explain the huge difference between SLO generator SLI and Dynatrace SLI. i will make an other update soon :)

lvaylet · 2023-09-15T15:32:32Z

Hi @GeoffroyLatourDK, any update to share?

GeoffroyLatourDK · 2023-09-20T08:50:32Z

Hello @lvaylet , as far as my investigation were going it is not a bug but a problem of precision on the part of dynatrace. In fact, when you request data extraction via the api for "large" periods of time, Dynatrace won't send all the data, but only averages for a hundred or so periods of time. For example, over a 28-day period, Dynatrace will send us the average response time over a 6-hour period. In my case, however, over a 6-hour period I may have many peaks above my limit value, but these will not be taken into account because the average will be below the threshold.

and to finish with I don't know if Dynatrace will do it will every kind of builtin metric and work diffrently with other type of metrics we only work with builtin metric.

So we can close the issue :)

GeoffroyLatourDK added bug Something isn't working triage labels Jun 7, 2023

GeoffroyLatourDK assigned lvaylet Jun 7, 2023

GeoffroyLatourDK changed the title ~~🐛 [BUG] - use built-in metric to create SLO~~ 🐛 [BUG] - use built-in metric to create Latency SLO On Dynatrace Jul 12, 2023

lvaylet removed the triage label Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [BUG] - use built-in metric to create Latency SLO On Dynatrace #342

🐛 [BUG] - use built-in metric to create Latency SLO On Dynatrace #342

GeoffroyLatourDK commented Jun 7, 2023

lvaylet commented Jun 9, 2023

GeoffroyLatourDK commented Jun 9, 2023

lvaylet commented Jun 14, 2023

GeoffroyLatourDK commented Jun 14, 2023

lvaylet commented Jun 14, 2023

lvaylet commented Jun 14, 2023

GeoffroyLatourDK commented Jun 20, 2023

GeoffroyLatourDK commented Jul 26, 2023

lvaylet commented Aug 16, 2023

lvaylet commented Aug 16, 2023

lvaylet commented Aug 17, 2023

GeoffroyLatourDK commented Aug 28, 2023

lvaylet commented Sep 15, 2023

GeoffroyLatourDK commented Sep 20, 2023

🐛 [BUG] - use built-in metric to create Latency SLO On Dynatrace #342

🐛 [BUG] - use built-in metric to create Latency SLO On Dynatrace #342

Comments

GeoffroyLatourDK commented Jun 7, 2023

SLO Generator Version

Python Version

What happened?

What did you expect?

Screenshots

Relevant log output

Code of Conduct

lvaylet commented Jun 9, 2023

GeoffroyLatourDK commented Jun 9, 2023

lvaylet commented Jun 14, 2023

GeoffroyLatourDK commented Jun 14, 2023

lvaylet commented Jun 14, 2023

lvaylet commented Jun 14, 2023

GeoffroyLatourDK commented Jun 20, 2023

GeoffroyLatourDK commented Jul 26, 2023

lvaylet commented Aug 16, 2023

lvaylet commented Aug 16, 2023

lvaylet commented Aug 17, 2023

GeoffroyLatourDK commented Aug 28, 2023

lvaylet commented Sep 15, 2023

GeoffroyLatourDK commented Sep 20, 2023