- dynamic performance analysis in real world deployment
- identify specific bottlenecks, because:
- premature optimisation is evil (80/20 rule: ~80% of the time is caused by ~20% of the code)
- we strive for readability and maintainability
- truthfully represent behaviour of program (no bias / interference)
- provide data with high accuracy
- show cpu cycles, wall time, memory access counters
- show function call graph and count
- various outputs:
- table with statistics
- trace (instruction path)
- live monitoring
- different collection methods:
- event-based (interrupt)
- statistical (sampling)
- instrumented (injection at source level, in bytecode or RAM)
- simulation (virtual machine)
- high level assessment
- automatic compiler / toolchain feedback
- easy to use
- somewhat fast
- portability should not be a big concern
Here is a brief selection of profiling tools and techniques as shown in the lecture.
In order to produce debugging information, we generally add the -g
flag to
the compiler options, which is equivalent to setting the build type to Debug
in cmake/ccmake.
To get a realistic picture, we also want to set the same full optimisation flags
we use in the release version.
The GNU profiler is a simple yet
useful GNU/Linux-based compiler-level instrumentation profiler which requires
-pg
(specific support for gprof) to be added to the compiler flags.
After compilation, executing the binary (e.g. src/main
) will produce a statistical call
graph profile named gmon.out
in the current directory. We can then see the relations
between the symbol table of the binary and the execution profile by running
gprof src/main > info.txt
and viewing info.txt
(which contains three documented sections) in any text editor.
Further instructions, e.g. how to generate call graphs, can be found in the gprof documentation.
A versatile Linux kernel performance counter tool for either specific binaries or the entire system.
We can obtain a brief execution summary by running the statistics subcommand:
perf stat src/main
Detailed data can be collected by recording the execution into a perf.data
file
and reporting on it (type ?
while in the report tool for more info):
perf record src/main
perf report
To generate a call graph visualisation, we record the stack chain / backtrace information (-g):
perf record -g -- src/main
Then we run the perf script command with c++filt
(demangle C++ function names) and
gprof2dot
(dot grapher for, among others, perf; available via pip):
perf script | c++filt | gprof2dot -f perf | dot -Tpng -o out.png
This creates a nice visual representation of function calls and work effort percentages in out.png
.
Live analysis, code annotation, various benchmarks for schedulers or memory access and many other features can be read up about in the perf tutorial.
Valgrind is a cross platform (GNU/Linux, OSX, Solaris)
virtual machine used for debugging in a variety
of areas (most popularly via the memcheck
tool).
Among profiling options, we can obtain a short summary of the cache performance:
valgrind --tool=cachegrind src/main
which also generates a detailed cachegrind.out.* report that can be assessed with:
cg_annotate cachegrind.out.NUM > info.txt
We can as well investigate the amount dynamic memory used by our program with Valgrind's heap profiler:
valgrind --tool=massif src/main
ms_print massif.out.NUM > info.txt
Another interesting use of Valgrind is live viewing the current stack frame backtrace.
In one terminal, we can observe any program under callgrind's control (abort with Ctrl+c):
`which watch` -pn 0.1 callgrind_control -b -e
While in another terminal, we execute the program:
valgrind --tool=callgrind src/main
Detailed explanation of the mentioned commands and a wealth of other applications (e.g. generating call graphs) can be found in the Valgrind manual.
gcov is a cross platform (Unix-like) code coverage and statement-level profiling
tool from the GCC suite, used to generate line-by-line profiles, annotated source
listings and perform basic-block counting.
It requires the g++ compiler flags -fprofile-arcs
(injects branch/call counters
and produces .gcda
files) and -ftest-coverage
(produces .gcno
files
documenting code coverage).
The flag --coverage
is a shortcut for these two settings.
After compilation, we can run the program (e.g. src/main
), which will generate
.gcda
and .gcno
data next to each compiled object file in the build directory.
We can find those with:
find -iname "*.gc*"
These allow us to generate annotated versions of all source files that went into creating a specific object file:
gcov ./src/CMakeFiles/main.dir/main.cpp.gcda
Now we can look at main.cpp.gcov
or sheep.hpp.gcov
in the current directory
to see how many times each line in those files was executed.
Further information can be found in the gcov documentation.
lcov is an extension to gcov, enabling the generation of pretty HTML output.
Having run the binary compiled with the --coverage
flag, we can create a trace
file and and use it to build a html documentation:
lcov --capture --directory src/CMakeFiles/main.dir/ --output-file cov.info
genhtml cov.info --output-dir lcov_html
firefox lcov_html/index.html
This allows us to check in a convenient manner how much of the source code was executed how often (or at all) in a particular run.
There are many other profiling tools. A very popular choice is
Intel VTune, which can be obtained
via academic licensing.
On Linux, OProfile is widely used.
For simple yet highly accurate and verbose cycle usage measurement, we recommend the Microbenchmark tool which was presented in the lecture.