v0.4.0
Note: OpenCL support is not possible on static Linux builds.
Release Highlights
- Overhaul GF16 processing backend
- Remove GF-Complete components, rewrite framework and improve general region handling
- Fully separate ISA compilation units to support dynamic dispatch (also enables static Linux builds)
- Add dot-product, region interleaving, chunk packing and prefetching optimisations
- Add new calculation kernels: CLMul for NEON, Affine AVX variant for x86 (for Alder Lake and later CPUs) and experimental Shuffle2x/Affine2x variants
- Add ARM SVE and SVE2 support
- More optimisations during initialisation for various kernels, coefficient computation, and tweaked loop-tiling parameters
- Improve transposition performance for Xor-Jit kernel, plus add single-use JIT optimisations
- Rework multi-threading and remove OpenMP dependency; threading now manually managed via libuv
- Add experimental OpenCL backend for GPGPU acceleration
- Disabled by default - must be manually enabled
- Have noticed it generate incorrect output, particularly on non-Windows hosts - use with caution!
- Add internal checksumming support to help detect memory errors during GF16 computation
- Improve concurrency when transferring to/from GF backend and hashing
- Redo MD5/CRC32 implementation for better optimisation
- Input hashing now uses a stitched 2xMD5+CRC32 implementation
- Add ASM MD5 implementation for x64/ARMv6/AArch64 (unsupported in MSVC)
- Add ARM NEON and SVE2 MD5 implementations
- Full SIMD width multi-buffer implementations
- Remove node-yencode dependency
- Add support for concurrently processing multiple files to work around bottlenecks with single threaded input hashing
- Support concurrent I/O requests with chunked reading
- Support for compiling under MSVC/Clang-CL for Windows ARM/64 targets
- Separate GUI frontend available
- Improve progress display accuracy
- Various bug fixes