Skip to content

v0.4.0

Compare
Choose a tag to compare
@animetosho animetosho released this 24 May 11:39
· 169 commits to master since this release

Note: OpenCL support is not possible on static Linux builds.

Release Highlights

  • Overhaul GF16 processing backend
    • Remove GF-Complete components, rewrite framework and improve general region handling
    • Fully separate ISA compilation units to support dynamic dispatch (also enables static Linux builds)
    • Add dot-product, region interleaving, chunk packing and prefetching optimisations
    • Add new calculation kernels: CLMul for NEON, Affine AVX variant for x86 (for Alder Lake and later CPUs) and experimental Shuffle2x/Affine2x variants
    • Add ARM SVE and SVE2 support
    • More optimisations during initialisation for various kernels, coefficient computation, and tweaked loop-tiling parameters
    • Improve transposition performance for Xor-Jit kernel, plus add single-use JIT optimisations
    • Rework multi-threading and remove OpenMP dependency; threading now manually managed via libuv
    • Add experimental OpenCL backend for GPGPU acceleration
      • Disabled by default - must be manually enabled
      • Have noticed it generate incorrect output, particularly on non-Windows hosts - use with caution!
  • Add internal checksumming support to help detect memory errors during GF16 computation
  • Improve concurrency when transferring to/from GF backend and hashing
  • Redo MD5/CRC32 implementation for better optimisation
    • Input hashing now uses a stitched 2xMD5+CRC32 implementation
    • Add ASM MD5 implementation for x64/ARMv6/AArch64 (unsupported in MSVC)
    • Add ARM NEON and SVE2 MD5 implementations
    • Full SIMD width multi-buffer implementations
    • Remove node-yencode dependency
  • Add support for concurrently processing multiple files to work around bottlenecks with single threaded input hashing
  • Support concurrent I/O requests with chunked reading
  • Support for compiling under MSVC/Clang-CL for Windows ARM/64 targets
  • Separate GUI frontend available
  • Improve progress display accuracy
  • Various bug fixes