Replies: 2 comments
-
So it seems that
|
Beta Was this translation helpful? Give feedback.
-
This could be a great addition to the developer documentation.
This kind of stuff is good motivation for making improvements to the symbol names—and perhaps design—of the generated code, because Assuming only one |
Beta Was this translation helpful? Give feedback.
-
I think I finally figured out a way to get good profiling information about numba-compiled graphs. Usually, I use
perf
to profile most things, but this doesn't work well at all with numba most of the time, because the jit-compilation doesn't save the necessary information in a way that perf can access it.Turns out however, that if we use the ahead-of-time compilation in numba, and ask it to export debugging symbols, we get everything we need.
Before we import numba, we set an environment variable
NUMBA_DEBUGINFO=1
:Let's for instance say we want to profile a function like this:
Now we can extract the numba function and compile it into a shared library with debugging symbols:
Now we just have to import the module (using importlib because we defined the name as str)
While this is running we can now run any
perf
command, for instanceperf -p{thepid} record
and CTRL-C after a while andperf report
to see the results. (pressinga
on a line shows the assembly with inlined python code, and indications about how much time we spend with this instruction). Usingperf stat -d
we can also get information about cache misses, branch mispredictions etc.In this particular case we can see for instance, that we spend most of our time in
numba_funcified_graph
, because the elemwise operationssquare
andmul
did not get inlined for some reason I don't understand yet.Beta Was this translation helpful? Give feedback.
All reactions