$ cabal run bench:Prelude.Serial # run selected
$ cabal run bench:Prelude.Serial -- --help # help on arguments
$ cabal run bench:Prelude.Serial -- --stdev 100000 # specify arguments
$ cabal run bench:Prelude.Serial --flag fusion-plugin # with fusion-plugin
$ cabal build bench:Prelude.Serial # build selected
$ cabal build --enable-benchmarks streamly-benchmarks # build all
$ cabal build --enable-benchmarks all # build all, alternate method
$ cabal build --flag "-opt" ... # disable optimization, faster build
The executable bench-runner
is the top level driver for
running benchmarks. It runs the requested benchmarks and then creates a
report from the results using the bench-show
package.
IMPORTANT NOTE: The first time you run this executable it may take a long
time because it has to build the bench-report
executable which has a
lot of dependencies.
You can install it once in the root of the repository and use it multiple times.
You can use cabal.project.report
to install bench-runner like so:
$ cabal install bench-runner --project-file=cabal.project.report --installdir=./ --overwrite-policy=always
$ ./bench-runner <bench-runner-args>
If you're using nix, you can install bench-runner like so:
$ cd benchmark/bench-runner
$ nix-shell --run 'cabal install bench-runner --installdir=../../ --overwrite-policy=always'
$ cd ../../
$ ./bench-runner <bench-runner-args>
You can run the bench-runner
without installing, like so:
$ cabal run bench-runner --project-file=cabal.project.report -- <bench-runner-args>
Assuming bench-runner
is the executable. You can replace ./bench-runner
with
cabal run bench-runner --project-file=cabal.project.report --
Note, you need to pass two mandatory arguments to bench-runner,
package-version
and package-name
. For the streamly-benchmarks package, pass
these as below:
bin/bench-runner --package-version 0.0.0 --package-name streamly-benchmarks
Useful commands:
$ ./bench-runner --help
$ ./bench-runner --quick # run all the benchmark suites
$ ./bench-runner --targets help # Show available benchmark suites
$ ./bench-runner --targets serial_grp # Run all serial benchmark suites
$ ./bench-runner --targets "Prelude.Serial Data.Parser" # run selected suites
$ ./bench-runner --no-measure # don't run benchmarks just show previous results
# Run all O(1) space complexity benchmarks in `Prelude.Serial` suite
$ ./bench-runner --targets Prelude.Serial --prefix Prelude.Serial/o-1-space
# Run a specific benchmark in `Prelude.Serial` suite
$ ./bench-runner --targets Prelude.Serial --prefix Prelude.Serial/o-1-space.generation.unfoldr
Note: bench-runner
enables fusion-plugin by default.
# Checkout baseline commit
$ ./bench-runner --quick
# Checkout commit with new changes
$ ./bench-runner --quick --append
# To add another result to comparisons just repeat the above command on
# desired commit
# To display the current results without running the benchmarks.
# See "Reporting without measuring" for more info.
$ ./bench-runner --no-measure
First see the available benchmark suites:
$ ./bench-runner --targets help
You will see some benchmark suites end with _cmp
, these are comparison
groups. If you run a comparison group benchmark, comparison of all the
benchmark suites in that group will be shown in the end. For example to compare
all array benchmark suites:
$ ./bench-runner --targets array_cmp
You can use the --no-measure
option to report the already measured results in
the benchmarks results file. A results file may collect an arbitrary number of
results by running with --append
multiple times. Each benchmark has its own
results file, for example the Prelude.Serial
benchmark has the results file at
charts/Prelude.Serial/results.csv
.
You can also manually edit the file to remove a set of results if you like or
to append results from previously saved results or from some other results
file. After editing you can run bench-runner
with the --no-measure
option to
see the reports corresponding to the results.
You can specify the stream size (default is 100000) to be used for benchmarking:
$ cabal run bench:Prelude.Serial -- --stream-size 1000000
In the FileSystem.Handle
benchmark you can specify the input file as an
environment variable:
$ export Benchmark_FileSystem_Handle_InputFile=./gutenberg-500.txt
$ cabal run FileSystem.Handle -- FileSystem.Handle/o-1-space/reduce/read/S.splitOnSeq
The automatic tests do not test unicode input, this option is useful to specify a unicode text file manually.
To run the unicode benchmarks on valid utf8 input, you can do the following,
$ Benchmark_FileSystem_Handle_InputFile=<valid-unicode-filepath> ./bench-runner --benchmarks Unicode.Stream --cabal-build-options "-f include-strict-utf8"
We run each benchmark in an isolated process to minimize interference of benchmarks and to be able to control the RTS memory restrictions per benchmark.
The benchmark driver forces a GC before and after the measurement. However, we have observed that sometimes the GC stats may not be accurate when the number of iterations in the measurement is small (e.g. 1 iteration). In such cases usually the number of GCs and GC times would also be 0.
When comparing different compilers we need to make sure that we are using exactly the same versions of the libraries for apples to apples comparison. We have seen cases where a change in the "random" library caused allocations regressions in the new version of compiler because of the way in which the benchmark code was generated due to the change.
When it is required to reproduce benchmark results precisely across different systems, it is recommended that you create and use a cabal freeze file so that the versions of all libraries are pinned.
There are two ways to find problematic code:
- Run performance benchmarks using
bench-runner
, select the benchmarks that are taking more than expected time. - When making a new change, compare with the baseline and select benchmarks
with the most regression reported by
bench-runner
.
Number of allocations are the most stable measure that do not vary from
run to run. cpuTime
and bytesCopied
may vary. When comparing two
runs for regression the first thing to look at is the difference in
allocations. Also note that allocations may vary from run to run for
concurrent benchmarks.
The next thing to look at is cpuTime. Please note that cpuTime may fluctuate quite a bit, you may want to run the relevant benchmarks without the --quick mode for confirming and make sure no other load is running on the system when measuring.
Usually the increase is cpuTime is proportional to the increase in allocations but sometimes it may increase independently because more cpu instructions are being executed. TBD - we should count the instructions instead.
Before you proceed make sure have to run the benchmarks with
inspection
flag on. It may catch any obvious issues or regressions.
$ cabal build --flag inspection --flag fusion-plugin --enable-benchmarks streamly-benchmarks
- Comment out all other benchmarks in the given benchmark suite, and keep only the one you are examining.
- Edit the file and add the following line on top:
{-# OPTIONS_GHC
-ddump-simpl
-ddump-to-file
-dsuppress-all
-Wmissed-specialisations
-Wall-missed-specialisations
-fplugin-opt=Fusion.Plugin:verbose=2
-fplugin-opt=Fusion.Plugin:dump-core
#-}
- Build the benchmark suite with fusion-plugin enabled:
$ cabal build bench:Prelude.Serial --flag fusion-plugin
See the .dump-simpl
file in the cabal build directory. You can find it
like this:
$ find dist-newstyle/ -name "*.dump-simpl"
Make sure you are looking into the right build dir (--build-dir
may change
dist-newstyle
to something else), and check in the appropriate GHC
version dir.
Sometimes you may want to create a separate program from the benchmark code removing the benchmarking harness to simplify and isolate the code for better reasoning and simpler core.
Add the following GHC options at the top of your file, say, example.hs:
{-# OPTIONS_GHC
-ddump-simpl
-ddump-to-file
-dsuppress-all
-Wmissed-specialisations
-Wall-missed-specialisations
-fplugin Fusion.Plugin
-fplugin-opt=Fusion.Plugin:verbose=2
-fplugin-opt=Fusion.Plugin:dump-core
#-}
Do not include the optimization options in OPTIONS_GHC pragma, instead, specify them on the command line. This is to avoid optimization failing if you import another module which is not compiled with the same optimization options.
$ cabal build # build and write ghc environment file
$ ghc -O2 -fspec-constr-recursive=16 -fmax-worker-args=16 example.hs
To pinpoint where the optimization is going wrong you can examine the plugin generated core files for each optimization pass. The files are numbered for each optimization pass. You can compare successive files using side-by-side diff and see what the compiler is doing between each pass.
Look for missed specialization messages. When you are comparing against a baseline, check if something that was specialized before is no longer specialized.
In the core you have to look for type class dictionaries e.g.
exc_r6DD = \ @s_a6ai -> try $fMonadCatchIO $fExceptionSomeException
Search for $f
in the core.
Look for unfused function warnings emitted by fusion-plugin. You may want to take a look at the unfused constructors or functions that fusion-plugin is warning about. Beware that:
- fusion-plugin emits warnings for unfused stuff in intermediate functions as well, those should be ignored.
- the constructors may remain genuinely unfused unless the loop is closed. So you should look at the warnings in the file where the loop is closed and everything is supposed to be fused.
Also, look at the core for unfused constructors. At times you may need to look for the boxed primitive type constructors e.g. W8# or I#, these may not be eliminated, usually, due to strictness issues.
Often it is useful to diff and compare the core without the problem and the core with the problem especially in cases when the problem is due to GHC version changes, or smaller changes in the code.
Note, some operations are inherently fusion breaking, those cannot fuse, they are usually annotated so in their documentation.
Review the problematic code, see the optimization guide for common problems and how to solve those. If no obvious issues are found on review, then generate and examine the core.
You may want to add the Fuse
annotation on some of those constructors
to make the code fuse. Please note that unnecessary Fuse
annotations
may cause unnecessary inlining. Also, make sure that the constructor you
are adding fuse annotation is not shared by any other code where you may
not want inlining/fusion.