Release v0.4.0 · animetosho/ParPar

Note: OpenCL support is not possible on static Linux builds.

Release Highlights

Overhaul GF16 processing backend
- Remove GF-Complete components, rewrite framework and improve general region handling
- Fully separate ISA compilation units to support dynamic dispatch (also enables static Linux builds)
- Add dot-product, region interleaving, chunk packing and prefetching optimisations
- Add new calculation kernels: CLMul for NEON, Affine AVX variant for x86 (for Alder Lake and later CPUs) and experimental Shuffle2x/Affine2x variants
- Add ARM SVE and SVE2 support
- More optimisations during initialisation for various kernels, coefficient computation, and tweaked loop-tiling parameters
- Improve transposition performance for Xor-Jit kernel, plus add single-use JIT optimisations
- Rework multi-threading and remove OpenMP dependency; threading now manually managed via libuv
- Add experimental OpenCL backend for GPGPU acceleration
  - Disabled by default - must be manually enabled
  - Have noticed it generate incorrect output, particularly on non-Windows hosts - use with caution!
Add internal checksumming support to help detect memory errors during GF16 computation
Improve concurrency when transferring to/from GF backend and hashing
Redo MD5/CRC32 implementation for better optimisation
- Input hashing now uses a stitched 2xMD5+CRC32 implementation
- Add ASM MD5 implementation for x64/ARMv6/AArch64 (unsupported in MSVC)
- Add ARM NEON and SVE2 MD5 implementations
- Full SIMD width multi-buffer implementations
- Remove node-yencode dependency
Add support for concurrently processing multiple files to work around bottlenecks with single threaded input hashing
Support concurrent I/O requests with chunked reading
Support for compiling under MSVC/Clang-CL for Windows ARM/64 targets
Separate GUI frontend available
Improve progress display accuracy
Various bug fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

Release Highlights