Could we automatically detect/report performance regressions? #285

maximecb · 2024-07-25T19:55:37Z

Recently we've been surprised by multiple major performance regressions, and we're finding that it's very challenging to identify the root cause(s) after the fact. It would be really useful if yjit-metrics could help us automatically flag potential regressions.

I know this is not an easy or trivial thing to implement because there can often be false positives due to the inherent noise in measurements. I was thinking that since we have error bars on benchmarks, it might be possible to have a criteria such as, for example:

If the average time for the current run of a benchmark is below the previous average for the benchmark, and the error bars for both runs don't overlap, then flag a potential regression.

We could also have some kind of an adjustable threshold such that we allow a certain gap between the error bars before we report a regression (multiples of the largest or smallest of the stddevs? e.g. only report a slowdown if the gap between error bars is greater than 0.5 * min(stddev1, stddev2)). We could tune this criteria to reduce the probability of false positives.

This system wouldn't be foolproof, it may not detect very small regressions, but I think that it could still be helpful because for example, if a microbenchmark suddently slows down by 5-10%, it would automatically get flagged. Currently we rarely ever look at our microbenchmarks, so these things can go completely undetected... But we could have a microbenchmark for object allocation, for example, and get automatically notified if object allocation takes a big drop. @XrXr

If one or more regressions are detected, a message could be posted in the benchmark CI slack channel, and the bot could do an @here so that people in the channel are notified, or tag specific members of the YJIT team.

The text was updated successfully, but these errors were encountered:

maximecb · 2024-07-25T20:07:29Z

In conjunction with this, I want to add an Object.new microbenchmark to yjit-bench: Shopify/yjit-bench#313

XrXr · 2024-07-25T21:01:44Z

Over in rust, they have a bot that evaluates perf on every PR, for example: rust-lang/rust#125999 (comment) It's a very different environment, but it might still be worth it to take some time to sniff around to see if there is anything they do we could copy :) For example https://github.com/rust-lang/rustc-perf/blob/master/collector/README.md

rwstauner · 2024-07-25T21:02:41Z

There is some code in this repo for "tripwires" that I think was meant for something like this but has been disabled... it might be worth digging that up to see what state it was in.

maximecb · 2024-07-25T21:20:26Z

Good suggestion @XrXr

There is some code in this repo for "tripwires" that I think was meant for something like this but has been disabled... it might be worth digging that up to see what state it was in.

Yes I think the reason it was disabled is that it was reporting too many false positives. However, I don't think it was doing anything fancy. I think if we make use of error bars, and we use better heuristics, we can probably have a better detector. In general, our numbers are quite stable over time.

If anything, even if it only flagged very big and very obvious regressions, it might still be useful, because even big 5% regressions can easily go unseen if we're not religiously paying attention, because I mean, a benchmark is showing 1.76% speedup. Was it 1.81% last week? Don't remember.

maximecb · 2024-07-25T21:22:42Z

As a sidenote we could also play special tricks like excluding benchmarks that have very big error bars, such as ruby-lsp for example. We can tune our heuristics until we get something that works well.

maximecb added the enhancement New feature or request label Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could we automatically detect/report performance regressions? #285

Could we automatically detect/report performance regressions? #285

maximecb commented Jul 25, 2024 •

edited by XrXr

Loading

maximecb commented Jul 25, 2024

XrXr commented Jul 25, 2024 •

edited

Loading

rwstauner commented Jul 25, 2024

maximecb commented Jul 25, 2024 •

edited

Loading

maximecb commented Jul 25, 2024

Could we automatically detect/report performance regressions? #285

Could we automatically detect/report performance regressions? #285

Comments

maximecb commented Jul 25, 2024 • edited by XrXr Loading

maximecb commented Jul 25, 2024

XrXr commented Jul 25, 2024 • edited Loading

rwstauner commented Jul 25, 2024

maximecb commented Jul 25, 2024 • edited Loading

maximecb commented Jul 25, 2024

maximecb commented Jul 25, 2024 •

edited by XrXr

Loading

XrXr commented Jul 25, 2024 •

edited

Loading

maximecb commented Jul 25, 2024 •

edited

Loading