-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could we automatically detect/report performance regressions? #285
Comments
In conjunction with this, I want to add an |
Over in rust, they have a bot that evaluates perf on every PR, for example: rust-lang/rust#125999 (comment) It's a very different environment, but it might still be worth it to take some time to sniff around to see if there is anything they do we could copy :) For example https://github.com/rust-lang/rustc-perf/blob/master/collector/README.md |
There is some code in this repo for "tripwires" that I think was meant for something like this but has been disabled... it might be worth digging that up to see what state it was in. |
Good suggestion @XrXr
Yes I think the reason it was disabled is that it was reporting too many false positives. However, I don't think it was doing anything fancy. I think if we make use of error bars, and we use better heuristics, we can probably have a better detector. In general, our numbers are quite stable over time. If anything, even if it only flagged very big and very obvious regressions, it might still be useful, because even big 5% regressions can easily go unseen if we're not religiously paying attention, because I mean, a benchmark is showing 1.76% speedup. Was it 1.81% last week? Don't remember. |
As a sidenote we could also play special tricks like excluding benchmarks that have very big error bars, such as ruby-lsp for example. We can tune our heuristics until we get something that works well. |
Recently we've been surprised by multiple major performance regressions, and we're finding that it's very challenging to identify the root cause(s) after the fact. It would be really useful if yjit-metrics could help us automatically flag potential regressions.
I know this is not an easy or trivial thing to implement because there can often be false positives due to the inherent noise in measurements. I was thinking that since we have error bars on benchmarks, it might be possible to have a criteria such as, for example:
We could also have some kind of an adjustable threshold such that we allow a certain gap between the error bars before we report a regression (multiples of the largest or smallest of the stddevs? e.g. only report a slowdown if the gap between error bars is greater than
0.5 * min(stddev1, stddev2)
). We could tune this criteria to reduce the probability of false positives.This system wouldn't be foolproof, it may not detect very small regressions, but I think that it could still be helpful because for example, if a microbenchmark suddently slows down by 5-10%, it would automatically get flagged. Currently we rarely ever look at our microbenchmarks, so these things can go completely undetected... But we could have a microbenchmark for object allocation, for example, and get automatically notified if object allocation takes a big drop. @XrXr
If one or more regressions are detected, a message could be posted in the benchmark CI slack channel, and the bot could do an
@here
so that people in the channel are notified, or tag specific members of the YJIT team.The text was updated successfully, but these errors were encountered: