Get profile of numba function evaluations #1086

aseyboldt · 2022-07-29T23:37:53Z

aseyboldt
Jul 29, 2022

I think I finally figured out a way to get good profiling information about numba-compiled graphs. Usually, I use perf to profile most things, but this doesn't work well at all with numba most of the time, because the jit-compilation doesn't save the necessary information in a way that perf can access it.

Turns out however, that if we use the ahead-of-time compilation in numba, and ask it to export debugging symbols, we get everything we need.

Before we import numba, we set an environment variable NUMBA_DEBUGINFO=1:

%env NUMBA_DEBUGINFO=1

import aesara
import aesara.tensor as at
import numpy as np
import numba

Let's for instance say we want to profile a function like this:

z = at.dvector("z")
idx = np.random.randint(100, size=10_000)
vals = at.dvector("vals")
out = (z[idx] ** 2).sum()
out_grad = at.grad(out, z)
func = aesara.function([z], [out, out_grad], mode="NUMBA")
aesara.dprint(func)

Sum{acc_dtype=float64} [id A] 6
 |Elemwise{Sqr}[(0, 0)] [id B] 4
   |AdvancedSubtensor1 [id C] 0
     |z [id D]
     |TensorConstant{[32 25 51 .. 38 84 87]} [id E]
AdvancedIncSubtensor1{inplace,inc} [id F] 5
 |Alloc [id G] 3
 | |TensorConstant{(1,) of 0.0} [id H]
 | |Shape_i{0} [id I] 1
 |   |z [id D]
 |Elemwise{mul,no_inplace} [id J] 2
 | |TensorConstant{(1,) of 2.0} [id K]
 | |AdvancedSubtensor1 [id C] 0
 |TensorConstant{[32 25 51 .. 38 84 87]} [id E]

Now we can extract the numba function and compile it into a shared library with debugging symbols:

from numba.pycc import CC

# The name of the shared library. Note that we need to change the name if we reexecute this in the same kernel
name = "numba_test_module4"
cc = CC(name)

# Get the numba function
func_numba = func.vm.jit_fn

# Run it once because I'm too lazy to write down the signature myself
z = np.random.randn(10_000)
func_numba(z)

# Collect it into the shared lib
cc.export("logp_grad", func_numba.signatures[0])(func_numba)

# Compile the shared lib
cc.compile()

Now we just have to import the module (using importlib because we defined the name as str)

import importlib
module = importlib.import_module(name)

import os

# Get the pid which we need for perf
print(os.getpid())

# Start executing
while True:
    module.logp_grad(x)

While this is running we can now run any perf command, for instance perf -p{thepid} record and CTRL-C after a while and perf report to see the results. (pressing a on a line shows the assembly with inlined python code, and indications about how much time we spend with this instruction). Using perf stat -d we can also get information about cache misses, branch mispredictions etc.

In this particular case we can see for instance, that we spend most of our time in numba_funcified_graph, because the elemwise operations square and mul did not get inlined for some reason I don't understand yet.

aseyboldt · 2022-07-30T00:09:09Z

aseyboldt
Jul 30, 2022
Author

So it seems that NUMBA_DEBUGINFO prevents some optimizations, it does inline mul and square and we still get at least some debuginfo if we set

%env NUMBA_ENABLE_PROFILING=1
%env NUMBA_DEBUGINFO=0

0 replies

brandonwillard · 2022-07-31T19:35:33Z

brandonwillard
Jul 31, 2022
Maintainer

This could be a great addition to the developer documentation.

In this particular case we can see for instance, that we spend most of our time in numba_funcified_graph

This kind of stuff is good motivation for making improvements to the symbol names—and perhaps design—of the generated code, because numba_funcified_graph is the generic name used for the JIT-compiled function of any/every converted FunctionGraph, which introduces some ambiguity. There's always at least one such function (i.e. the FunctionGraph for the compiled graph itself), but there can be more.

Assuming only one numba_funcified_graph, that line could be indicative of a number of things, but, foremost, it would be the point of entrance for the Numba compiled function, and that should mostly consist of calls to other Numba-compiled functions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get profile of numba function evaluations #1086

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Get profile of numba function evaluations #1086

aseyboldt Jul 29, 2022

Replies: 2 comments

aseyboldt Jul 30, 2022 Author

brandonwillard Jul 31, 2022 Maintainer

aseyboldt
Jul 29, 2022

aseyboldt
Jul 30, 2022
Author

brandonwillard
Jul 31, 2022
Maintainer