-
Notifications
You must be signed in to change notification settings - Fork 8
CPU and mem Profiling
Python lists are known for not being very efficient with numbers. However we know from experiences with numpy record arrays that sometimes more advanced data structures have heavy headers which can hinder all the benefits of the compact data they hold.
To clear all myths, here's some measurements
Let's consider arrays of 20 DOUBLE elements. Yes, double as Python does internal optimizations in reusing int objects and it's much harder to get reliable results.
To avoid internal intermediate lists, we use a generator for N numbers, and keep results in a list of 200k such arrays
Experiment:
def randoms(x):
for _ in range(x):
yield random.random()
In [21]: from array import array
In [22]: %timeit %memit a = [list(randoms(20)) for _ in range(200000)]
peak memory: 783.03 MiB, increment: 169.73 MiB
peak memory: 953.39 MiB, increment: 170.36 MiB
peak memory: 956.25 MiB, increment: 169.12 MiB
peak memory: 942.64 MiB, increment: 159.00 MiB
peak memory: 945.84 MiB, increment: 158.46 MiB
peak memory: 954.67 MiB, increment: 170.78 MiB
peak memory: 956.51 MiB, increment: 168.87 MiB
peak memory: 956.14 MiB, increment: 172.00 MiB
1.27 s ± 46.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [23]: %timeit %memit a = [array("d", randoms(20)) for _ in range(200000)]
peak memory: 662.98 MiB, increment: 48.59 MiB
peak memory: 712.14 MiB, increment: 49.16 MiB
peak memory: 711.84 MiB, increment: 43.85 MiB
peak memory: 712.43 MiB, increment: 48.93 MiB
peak memory: 711.86 MiB, increment: 43.63 MiB
peak memory: 712.19 MiB, increment: 48.45 MiB
peak memory: 711.88 MiB, increment: 43.39 MiB
peak memory: 712.18 MiB, increment: 48.19 MiB
1.21 s ± 9.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [25]: import numpy
In [26]: %timeit %memit a = [numpy.fromiter(randoms(20), dtype="f8") for _ in range(200000)]
peak memory: 625.45 MiB, increment: 11.29 MiB
peak memory: 639.73 MiB, increment: 14.27 MiB
peak memory: 640.07 MiB, increment: 10.24 MiB
peak memory: 640.19 MiB, increment: 14.11 MiB
peak memory: 640.09 MiB, increment: 10.01 MiB
peak memory: 639.48 MiB, increment: 13.14 MiB
peak memory: 640.11 MiB, increment: 9.78 MiB
peak memory: 640.25 MiB, increment: 13.66 MiB
1.24 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Observations:
Even at such low length, these compact structures are still much more efficient.
From plain list to array.array we see 4x in average, and wrt numpy we see 10-15x
Conclusions:
The story might be different for structured arrays, but in a plain 1D array there is no doubt, array.array can help us achieve quite significant mem savings, keeping the same functionality (e.g. resize).
If you can give up the API compat goodies, go numpy. It surely is quite optimized for low overhead.
Numpy structured arrays are really cool. Another level up we have record arrays which provide better API, enabling .field syntax.
In Neurodamus we make use of it to cache edge info (synapses properties) so that a single record can be passed to hoc and, due to .field access, it will just work.
However recent benchmarks were worrying:
In [28]: %timeit entry_1["u_hill_coefficient"]
1.15 µs ± 7.93 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [29]: %timeit entry_1.u_hill_coefficient
2.91 µs ± 6.28 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Correct, we notice a factor of 2.5x slower!
But is it negligible in the context of a full Neurodamus run?
We took a run with 100 neocortex cells and 5000 synapses per cell and inside the core loop "for i, syn_params in enumerate(synapses_params)" did the swicth in 4 struct accesses. The result:
Function: connect_all at line 528
File: /gpfs/bbp.cscs.ch/home/leite/dev/neurodamus/neurodamus-py/neurodamus/connection_manager.py
Before:
Total time: 83.6179 s
After
Total time: 76.1938 s
That is nearly 10% in the overall connectivity setting! Please don't use .prop notation anymore in performance critical code.