Update finalization

ogxd · Oct 5, 2023 · fbdba7b · fbdba7b
1 parent 8d043d7
commit fbdba7b
Show file tree

Hide file tree

Showing 4 changed files with 56 additions and 72 deletions.
diff --git a/article/article.tex b/article/article.tex
@@ -373,39 +373,13 @@ \subsection{Compression}
 
 \subsection{Finalization}
 
-The finalization process in the GxHash-0 algorithm is crucial to ensure the transformation of its internal state into a fixed-size, uniformly distributed hash output. This process is delineated into two primary steps: mixing the bits and folding (reducing) to the desired hash size.
+The finalization process in the GxHash-0 algorithm is crucial to ensure the transformation of its internal state into a fixed-size, uniformly distributed hash output. This process is delineated into two primary steps: mixing the bits and reducing to the desired hash size.
 
-\subsubsection{Mixing}
+This mixing step is responsible for ensuring the even distribution of bits in the state, thereby reducing patterns or biases that might arise from the input data or the compression process. Given the inherent simplicity of the GxHash-0 compression, it is worth for the finalization to incorporate slightly more intricate bit mixing operations, especially given it runs only once per message hashed, as opposed to the compression that occurs once for each block.\\
+Leveraging SIMD capabilities can help in regard to performance and efficiency, which remains for us a primary consideration. Fortunately, both x86 and ARM architectures provide AES (Advanced Encryption Standard) intrinsics that serve as efficient tools for bit mixing. The use of three AES block cipher rounds ensures a robust diffusion of bits across the state at a cheap computational cost.\\
+The AES keys can be set changed, providing a way to have unique hashes per-application and even per-process, protecting from eventual precomputed or replay attack attempts.
 
-This step is responsible for ensuring the even distribution of bits in the state, thereby reducing patterns or biases that might arise from the input data or the compression process. Given the inherent simplicity of the GxHash-0 compression, it is worth for the finalization to incorporate slightly more intricate bit mixing operations, especially given it runs only once per message hashed, as opposed to the compression that occurs once for each block.\\
-Leveraging SIMD capabilities can help in regard to performance and efficiency, which remains for us a primary consideration. Fortunately, both x86 and ARM architectures provide AES (Advanced Encryption Standard) intrinsics that serve as efficient tools for bit mixing. The use of an AES block cipher intrinsics ensures a robust diffusion of bits across the state at a cheap computational cost.\\
-On top of that, salt is added while forming the AES block cipher keys. This not only improves the distribution but also provides a way to use random salt values per-process, protecting from eventual precomputed or replay attack attempts.
-
-\begin{figure}[ht]
-\begin{lstlisting}[language=Rust, style=boxed]
-use core::arch::x86_64::*;
-
-pub fn mix(hash: state) -> state {
-    // Salt (Knuth primes recommended)
-    let salt = _mm256_set_epi64x(
-        -4860325414534694371,
-        8120763769363581797,
-        -4860325414534694371,
-        8120763769363581797);
-
-    let keys = _mm256_mul_epu32(salt, hash);
-    return _mm256_aesenc_epi128(hash, keys);
-}
-\end{lstlisting}
-\caption{GxHash-0 Mixing in Rust}
-\label{fig:mixing_rust_example}
-\end{figure}
-
-\subsubsection{Folding}
-
-Once the state's bits have been thoroughly mixed, the next step is to condense or "fold" this state into a smaller, fixed-size hash output, typically 32 or 64 bits. A straightforward and effective approach taken for GxHash-0 to achieve this reduction is by summing the constituent \( X \)-bit integer parts of the mixed state. This summation serves as a reduction function, ensuring that the output hash remains within the desired size bounds while still retaining the essence of the mixed state.
-
-While it would have been interesting to leverage SIMD once again, it turned out that, in practice, for both 128-bit and 256-bit state and for both x86 and ARM, summing the integer parts one by one is as performant if not more performant than using the various SIMD workarounds I tested.
+Once the state's bits have been thoroughly mixed, the next step is to reduce this state into a smaller, fixed-size hash output, typically 32 or 64 bits. There are several approaches to this, one being combiniting the \( X \)-bit integer parts of the mixed state together (by xoring them together for instance). GxHash0 takes a simpler path by reinterpreting our state into a smaller \( X \)-bit value, assuming an uniform distribution at the mixing stage thanks to the 3 rounds of AES. This allow the GxHash0 algorithm to generate hashes of any size up to \( s_b \) bits with virtually no additional computational cost.
 
 \subsection{Implementation Details}
 
@@ -492,15 +466,27 @@ \subsubsection{Benchmark Quality Criteria}
     \item \textbf{Performance:} The performance of a non cryptographic hash function is usually reflected by the performance of the application using it. For instance, a fast non-cryptographic hash function generally implies a fast hash table. This specific criteria will be tackled in the next section which is dedicated to it. 
 \end{itemize}
 
-While we can compute quality metrics, the result will greatly vary depending on the actual inputs used for our hash function. Pragmatically, we choose a few patterns for our input data:
-\begin{itemize}
-    \item Randomly generated inputs to observe how the hash function behaves with truly unpredictable data
-    \item Sequential inputs to observe how the function handles closely related values. Typically, close values would highlight weaknesses in distribution.
-    \item English words inputs to observe how the function behaves in a "real world scenario"
-\end{itemize}
-
 \subsubsection{Quality Criteria Results}
-For reference, we include a few well-known algorithms on top of GxHash-0.
+
+While we can compute quality metrics, the result will greatly vary depending on the actual inputs used for our hash function. Let's see how the GxHash0 algorithm qualifies against a few well known non-cryptographic algorithm in a few scenarios.
+
+\paragraph{Random Blobs}
+
+Randomly generated inputs to observe how the hash function behaves with truly unpredictable data
+
+\paragraph{Sequential Number}
+
+Sequential inputs to observe how the function handles closely related values. Typically, close values would highlight weaknesses in distribution.
+
+\paragraph{English Words}
+English words inputs to observe how the function behaves in a "real world scenario"
+
+\begin{figure}[h]
+\centering
+\includegraphics[width=1\textwidth]{quality-sequential.png}
+\caption{Merkle–Damgård Construction Overview}
+\label{fig:linear-construction}
+\end{figure}
 
 \subsection{Performance}
 

diff --git a/article/quality-random.png b/article/quality-random.png
diff --git a/article/quality-sequential.png b/article/quality-sequential.png
diff --git a/src/gxhash.rs b/src/gxhash.rs
@@ -47,23 +47,16 @@ mod platform_defs {
     }
 
     #[inline]
-    pub unsafe fn mix(hash: state) -> state {
+    pub unsafe fn finalize(hash: state) -> u32 {
         let salt = vcombine_s64(vcreate_s64(4860325414534694371), vcreate_s64(8120763769363581797));
         let keys = vmulq_s32(
             ReinterpretUnion { int64: salt }.int32,
             ReinterpretUnion { int8: hash }.int32);
         let a = vaeseq_u8(ReinterpretUnion { int8: hash }.uint8, vdupq_n_u8(0));
         let b = vaesmcq_u8(a);
         let c = veorq_u8(b, ReinterpretUnion{ int32: keys }.uint8);
-        ReinterpretUnion{ uint8: c }.int8
-    }
-
-    #[inline]
-    pub unsafe fn fold(hash: state) -> u32 {
-        // Bit-cast the int8x16_t to uint32x4_t
-        let vec_u32: uint32x4_t = ReinterpretUnion { int8: hash }.uint32;
-        // Get the first u32 value from the vector
-        vgetq_lane_u32(vec_u32, 3)
+        let p = &ReinterpretUnion{ uint8: c }.int8 as *const state as *const u32;
+        *p
     }
 }
 
@@ -93,8 +86,10 @@ mod platform_defs {
     #[inline]
     pub unsafe fn get_partial(p: *const state, len: isize) -> state {
         const MASK: [u8; size_of::<state>() * 2] = [
-            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
-            0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ];
+            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+            0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+            0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ];
 
         // Safety check
         if check_same_page(p) { // false {//
@@ -131,22 +126,26 @@ mod platform_defs {
     }
 
     #[inline]
-    pub unsafe fn mix(hash: state) -> state {
-        let salt = _mm256_set_epi64x(-4860325414534694371, 8120763769363581797, -4860325414534694371, 8120763769363581797);
-        //let keys = _mm256_mul_epu32(salt, hash);
-        _mm256_aesenc_epi128(hash, salt)
-    }
-
-    #[inline]
-    pub unsafe fn fold(hash: state) -> u32 {
-        let p = &hash as *const state as *const u32;
-        *p  ^ *p.offset(1)
-            ^ *p.offset(2)
-            ^ *p.offset(3)
-            ^ *p.offset(4)
-            ^ *p.offset(5)
-            ^ *p.offset(6)
-            ^ *p.offset(7)
+    #[allow(overflowing_literals)]
+    pub unsafe fn finalize(hash: state) -> u32 {
+        // Xor 256 state into 128 bit state for AES
+        let lower = _mm256_castsi256_si128(hash);
+        let upper = _mm256_extracti128_si256(hash, 1);
+        let mut hash = _mm_xor_si128(lower, upper);
+
+        // Hardcoded AES keys
+        let salt1 = _mm_set_epi32(0x713B01D0, 0x8F2F35DB, 0xAF163956, 0x85459F85);
+        let salt2 = _mm_set_epi32(0x1DE09647, 0x92CFA39C, 0x3DD99ACA, 0xB89C054F);
+        let salt3 = _mm_set_epi32(0xC78B122B, 0x5544B1B7, 0x689D2B7D, 0xD0012E32);
+
+        // 3 rounds of AES
+        hash = _mm_aesenc_si128(hash, salt1);
+        hash = _mm_aesenc_si128(hash, salt2);
+        hash = _mm_aesenclast_si128(hash, salt3);
+
+        // Truncate to output hash size
+        let p = &hash as *const __m128i as *const u32;
+        *p
     }
 }
 
@@ -157,8 +156,7 @@ pub use platform_defs::*;
 #[cfg(test)]
 pub static mut COUNTERS : Vec<usize> = vec![];
 
-#[inline]
-//#[inline(never)]
+#[inline] // To be disabled when profiling
 pub fn gxhash(input: &[u8]) -> u32 {
     unsafe {
         const VECTOR_SIZE: isize = std::mem::size_of::<state>() as isize;
@@ -241,7 +239,7 @@ pub fn gxhash(input: &[u8]) -> u32 {
             hash_vector = compress(hash_vector, partial_vector);
         }
 
-        fold(mix(hash_vector))
+        finalize(hash_vector)
     }
 }
 
@@ -271,10 +269,10 @@ mod tests {
     }
 
     #[test]
-    fn hash_of_zero_is_zero() {
+    fn hash_of_zero_is_not_zero() {
         let zero_bytes = [0u8; 1200];
 
         let hash = gxhash(&zero_bytes);
-        assert_eq!(0, hash);
+        assert_ne!(0, hash);
     }
 }