Implementations of neuroimaging computation tasks for benchmarking performance and code demonstration across programming languages.
Currently implementing
niimean
-- mean of nii.gz datasetvoxcor
-- correlation within each ROI between two 3D images.
Use
make
to run shootout.make check
to confirm implementations are correct.make depends
to runsetup.bash
, more in #setup
Also see
- https://benchmarksgame-team.pages.debian.net/benchmarksgame/ -- run time benchmarks across many languages and tasks
- https://julialang.org/benchmarks/ -- run times for science specific tasks and environments
- https://rosettacode.org/ -- programming chrestomathy: same task in many programming languages
- Rust, Julia, Fortran, and Perl have Int16 overruns that were not obvious. The issue is observed reading in an int16 nii.gz and summing over the large image (to calculate mean). Scaling and un-scaling is a fast workaround for Julia that does not work in perl. Using Int16 is faster than double but is inaccurate (unless rewritten to calculate a running mean).
- Runtime startups are especially slow for R and julia (and matlab). See
within-env/
. - Java/JVM is on par with golang and slower than python/numpy. How to implement SIMD optimization is not immediately obvious. Library/packaging is a pain without an IDE. See jvm.md for notes.
- I couldn't find a fortran nifti library. For
niimean.f90
, I'm using an already-uncompressed nifti with fixed parameters (e.g. assumes datatype=real) without shape (1D array a la javascript). When keeping the arrayreal
type, mean is calculated in ~70ms but the results are off (Int16 overrun). Recasting the type toreal(16)
is much slower (~400ms) but accurate.stdlib_stats
'smean
was slightly slower (80ms) and increased the binary size from 20K to 2.8M, but did not handle the overrun any better. Using a loop to accumulate the sum was slower than type recasting (500 vs 400ms) but both provide an accurate mean calculation. perl
library dependencies (namelyPDL
) is easy on debian, but difficult on Archlinux (AUR packages fail to compile.)- Missing implementation
- I couldn't find nifi libraries for elixir/erlang (BEAM), php, nor ruby. Foreign Function Interface scares me and I haven't been able to figure out using FFI.
- common lisp is also missing a ready library, but implementation with
lisp-binary
looks feasible. And usingapril
for APL style math is an interesting prospect. patch/
adds gzip support toPDL::IO::Nifti
.
See hyperfine out/*stats.csv
.
Collected by Makefile
across files in scripts/
.
simple mean of 3D image (wf-mp2rage-7t_2017087.nii.gz
)
command mean (seconds)
fslstats 0.35308495769
niimean.rs 0.4606315605666667
3dBrickStat 0.51865660084
MeasureMinMaxMean 0.5215459278600002
niimean.js 0.9318040066400002
niimean.pl 1.0452616458333333
niimean.jl 1.6311526155799996
niimean.py 1.7192550077300002
mris_calc 2.0955282514799993
niimean.m 2.103778385753333
niimean.R 2.6206949904699997
niimean.go 2.829263984946666
vs manual loop voxcor
command mean
voxcor.rs 0.10011695514074072
voxcor.go 0.2599205325600001
voxcor.py 1.38515054508
voxcor.m 1.6904066477000002
voxcor.R 2.05836244986
voxcor.jl 2.7809278878133337
For a simple mean calc, javascript is fast and go is slow!
Especially for julia and R, the interpreter/VM's startup time. They also demonstrate how much effort the community/library authors have put into optimizing (likely w/ compiled c
code, SIMD optimizations) hot paths (spm12 in octave, numpy in python, pdl in perl).
- The rust implementation should be built with
--release
, debug version performance is 10x worse! - Julia's interpreter (1.9.3) startup time is reasonable! It's overall time is on par with python (numpy, not native python).
- SIMD "vectorized" operations in python (via numpy) are fast!
- R's slow to start.
- Javascript is painful to write. Both it and the golang version organize the nifti matrix data as a 1D vector.
- Processor makes a difference in the shootout.
Cold startup measures are useful for utilities but not representative of interactive long-running work within an interpreter. In those cases the one time start-up cost is irrelevant. within-env/
benchmarks the same tasks but with the environment already loaded.
Run make in shell on a terminal. Optionally set NRUN
(default 100 runs for each)
make NRUN=10
Benchmarking uses hyperfine
.
See ./setup.bash
for some library setup automation/hints (also make depends
).
- c
- c++
mris_calc
from freesurferMeasureMinMaxMean
from ANTs
- rust
[rustup](https://rustup.rs/) update
- octave
- download spm12 and extract to
~/Downoads/spm12
scripts/niimean.m
hardcodes addpath - compile
cd src/ && make PLATFORM=octave install
- download spm12 and extract to
- julia
- install package in repl like
] add NIfTI
- install package in repl like
- R
install.packages('oro.nifti')
- deno
cargo install deno
- first run will pull in npm package
nifti-reader-js
- julia
NiFti
andStatistics
- perl
PDL
andpatch/Niftigz.pm
go (1.19 vs 1.21) and octave (7 vs 8.3) are out of date on debian stable. To use newer versions on rhea, update the path to include the compile-from-recent-source bin dir:
export PATH="/opt/ni_tools/utils/go/bin:$PATH"
# source dl and extracted to /opt/ni_tools/octave-8.3.0-src
# ./configure --prefix=/opt/ni_tools/octave-8.3/ && make install
export PATH="/opt/ni_tools/octave-8.3/bin:$PATH"
NB. but use debian backport https://wiki.debian.org/SimpleBackportCreation
- get expert goland and rust advice/implementation (should be faster?)
- implement various styles (and more complex calculations)
- loop vs vector; expect python loop to be especially slow
- parallel processing
- containerize benchmarks
- other implementations
- does GraalVM native improve java benchmarks (faster startup)?
- clojure/babashka
- julia's APL implementation
- common lisp or guile/scheme version (ffi w/ niftilib)
- compile julia