Skip to content

Commit

Permalink
Update finalization
Browse files Browse the repository at this point in the history
  • Loading branch information
ogxd committed Oct 5, 2023
1 parent 8d043d7 commit fbdba7b
Show file tree
Hide file tree
Showing 4 changed files with 56 additions and 72 deletions.
64 changes: 25 additions & 39 deletions article/article.tex
Original file line number Diff line number Diff line change
Expand Up @@ -373,39 +373,13 @@ \subsection{Compression}

\subsection{Finalization}

The finalization process in the GxHash-0 algorithm is crucial to ensure the transformation of its internal state into a fixed-size, uniformly distributed hash output. This process is delineated into two primary steps: mixing the bits and folding (reducing) to the desired hash size.
The finalization process in the GxHash-0 algorithm is crucial to ensure the transformation of its internal state into a fixed-size, uniformly distributed hash output. This process is delineated into two primary steps: mixing the bits and reducing to the desired hash size.

\subsubsection{Mixing}
This mixing step is responsible for ensuring the even distribution of bits in the state, thereby reducing patterns or biases that might arise from the input data or the compression process. Given the inherent simplicity of the GxHash-0 compression, it is worth for the finalization to incorporate slightly more intricate bit mixing operations, especially given it runs only once per message hashed, as opposed to the compression that occurs once for each block.\\
Leveraging SIMD capabilities can help in regard to performance and efficiency, which remains for us a primary consideration. Fortunately, both x86 and ARM architectures provide AES (Advanced Encryption Standard) intrinsics that serve as efficient tools for bit mixing. The use of three AES block cipher rounds ensures a robust diffusion of bits across the state at a cheap computational cost.\\
The AES keys can be set changed, providing a way to have unique hashes per-application and even per-process, protecting from eventual precomputed or replay attack attempts.

This step is responsible for ensuring the even distribution of bits in the state, thereby reducing patterns or biases that might arise from the input data or the compression process. Given the inherent simplicity of the GxHash-0 compression, it is worth for the finalization to incorporate slightly more intricate bit mixing operations, especially given it runs only once per message hashed, as opposed to the compression that occurs once for each block.\\
Leveraging SIMD capabilities can help in regard to performance and efficiency, which remains for us a primary consideration. Fortunately, both x86 and ARM architectures provide AES (Advanced Encryption Standard) intrinsics that serve as efficient tools for bit mixing. The use of an AES block cipher intrinsics ensures a robust diffusion of bits across the state at a cheap computational cost.\\
On top of that, salt is added while forming the AES block cipher keys. This not only improves the distribution but also provides a way to use random salt values per-process, protecting from eventual precomputed or replay attack attempts.

\begin{figure}[ht]
\begin{lstlisting}[language=Rust, style=boxed]
use core::arch::x86_64::*;

pub fn mix(hash: state) -> state {
// Salt (Knuth primes recommended)
let salt = _mm256_set_epi64x(
-4860325414534694371,
8120763769363581797,
-4860325414534694371,
8120763769363581797);

let keys = _mm256_mul_epu32(salt, hash);
return _mm256_aesenc_epi128(hash, keys);
}
\end{lstlisting}
\caption{GxHash-0 Mixing in Rust}
\label{fig:mixing_rust_example}
\end{figure}

\subsubsection{Folding}

Once the state's bits have been thoroughly mixed, the next step is to condense or "fold" this state into a smaller, fixed-size hash output, typically 32 or 64 bits. A straightforward and effective approach taken for GxHash-0 to achieve this reduction is by summing the constituent \( X \)-bit integer parts of the mixed state. This summation serves as a reduction function, ensuring that the output hash remains within the desired size bounds while still retaining the essence of the mixed state.

While it would have been interesting to leverage SIMD once again, it turned out that, in practice, for both 128-bit and 256-bit state and for both x86 and ARM, summing the integer parts one by one is as performant if not more performant than using the various SIMD workarounds I tested.
Once the state's bits have been thoroughly mixed, the next step is to reduce this state into a smaller, fixed-size hash output, typically 32 or 64 bits. There are several approaches to this, one being combiniting the \( X \)-bit integer parts of the mixed state together (by xoring them together for instance). GxHash0 takes a simpler path by reinterpreting our state into a smaller \( X \)-bit value, assuming an uniform distribution at the mixing stage thanks to the 3 rounds of AES. This allow the GxHash0 algorithm to generate hashes of any size up to \( s_b \) bits with virtually no additional computational cost.

\subsection{Implementation Details}

Expand Down Expand Up @@ -492,15 +466,27 @@ \subsubsection{Benchmark Quality Criteria}
\item \textbf{Performance:} The performance of a non cryptographic hash function is usually reflected by the performance of the application using it. For instance, a fast non-cryptographic hash function generally implies a fast hash table. This specific criteria will be tackled in the next section which is dedicated to it.
\end{itemize}

While we can compute quality metrics, the result will greatly vary depending on the actual inputs used for our hash function. Pragmatically, we choose a few patterns for our input data:
\begin{itemize}
\item Randomly generated inputs to observe how the hash function behaves with truly unpredictable data
\item Sequential inputs to observe how the function handles closely related values. Typically, close values would highlight weaknesses in distribution.
\item English words inputs to observe how the function behaves in a "real world scenario"
\end{itemize}

\subsubsection{Quality Criteria Results}
For reference, we include a few well-known algorithms on top of GxHash-0.

While we can compute quality metrics, the result will greatly vary depending on the actual inputs used for our hash function. Let's see how the GxHash0 algorithm qualifies against a few well known non-cryptographic algorithm in a few scenarios.

\paragraph{Random Blobs}

Randomly generated inputs to observe how the hash function behaves with truly unpredictable data

\paragraph{Sequential Number}

Sequential inputs to observe how the function handles closely related values. Typically, close values would highlight weaknesses in distribution.

\paragraph{English Words}
English words inputs to observe how the function behaves in a "real world scenario"

\begin{figure}[h]
\centering
\includegraphics[width=1\textwidth]{quality-sequential.png}
\caption{Merkle–Damgård Construction Overview}
\label{fig:linear-construction}
\end{figure}

\subsection{Performance}

Expand Down
Binary file added article/quality-random.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added article/quality-sequential.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 31 additions & 33 deletions src/gxhash.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,23 +47,16 @@ mod platform_defs {
}

#[inline]
pub unsafe fn mix(hash: state) -> state {
pub unsafe fn finalize(hash: state) -> u32 {
let salt = vcombine_s64(vcreate_s64(4860325414534694371), vcreate_s64(8120763769363581797));
let keys = vmulq_s32(
ReinterpretUnion { int64: salt }.int32,
ReinterpretUnion { int8: hash }.int32);
let a = vaeseq_u8(ReinterpretUnion { int8: hash }.uint8, vdupq_n_u8(0));
let b = vaesmcq_u8(a);
let c = veorq_u8(b, ReinterpretUnion{ int32: keys }.uint8);
ReinterpretUnion{ uint8: c }.int8
}

#[inline]
pub unsafe fn fold(hash: state) -> u32 {
// Bit-cast the int8x16_t to uint32x4_t
let vec_u32: uint32x4_t = ReinterpretUnion { int8: hash }.uint32;
// Get the first u32 value from the vector
vgetq_lane_u32(vec_u32, 3)
let p = &ReinterpretUnion{ uint8: c }.int8 as *const state as *const u32;
*p
}
}

Expand Down Expand Up @@ -93,8 +86,10 @@ mod platform_defs {
#[inline]
pub unsafe fn get_partial(p: *const state, len: isize) -> state {
const MASK: [u8; size_of::<state>() * 2] = [
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ];
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ];

// Safety check
if check_same_page(p) { // false {//
Expand Down Expand Up @@ -131,22 +126,26 @@ mod platform_defs {
}

#[inline]
pub unsafe fn mix(hash: state) -> state {
let salt = _mm256_set_epi64x(-4860325414534694371, 8120763769363581797, -4860325414534694371, 8120763769363581797);
//let keys = _mm256_mul_epu32(salt, hash);
_mm256_aesenc_epi128(hash, salt)
}

#[inline]
pub unsafe fn fold(hash: state) -> u32 {
let p = &hash as *const state as *const u32;
*p ^ *p.offset(1)
^ *p.offset(2)
^ *p.offset(3)
^ *p.offset(4)
^ *p.offset(5)
^ *p.offset(6)
^ *p.offset(7)
#[allow(overflowing_literals)]
pub unsafe fn finalize(hash: state) -> u32 {
// Xor 256 state into 128 bit state for AES
let lower = _mm256_castsi256_si128(hash);
let upper = _mm256_extracti128_si256(hash, 1);
let mut hash = _mm_xor_si128(lower, upper);

// Hardcoded AES keys
let salt1 = _mm_set_epi32(0x713B01D0, 0x8F2F35DB, 0xAF163956, 0x85459F85);
let salt2 = _mm_set_epi32(0x1DE09647, 0x92CFA39C, 0x3DD99ACA, 0xB89C054F);
let salt3 = _mm_set_epi32(0xC78B122B, 0x5544B1B7, 0x689D2B7D, 0xD0012E32);

// 3 rounds of AES
hash = _mm_aesenc_si128(hash, salt1);
hash = _mm_aesenc_si128(hash, salt2);
hash = _mm_aesenclast_si128(hash, salt3);

// Truncate to output hash size
let p = &hash as *const __m128i as *const u32;
*p
}
}

Expand All @@ -157,8 +156,7 @@ pub use platform_defs::*;
#[cfg(test)]
pub static mut COUNTERS : Vec<usize> = vec![];

#[inline]
//#[inline(never)]
#[inline] // To be disabled when profiling
pub fn gxhash(input: &[u8]) -> u32 {
unsafe {
const VECTOR_SIZE: isize = std::mem::size_of::<state>() as isize;
Expand Down Expand Up @@ -241,7 +239,7 @@ pub fn gxhash(input: &[u8]) -> u32 {
hash_vector = compress(hash_vector, partial_vector);
}

fold(mix(hash_vector))
finalize(hash_vector)
}
}

Expand Down Expand Up @@ -271,10 +269,10 @@ mod tests {
}

#[test]
fn hash_of_zero_is_zero() {
fn hash_of_zero_is_not_zero() {
let zero_bytes = [0u8; 1200];

let hash = gxhash(&zero_bytes);
assert_eq!(0, hash);
assert_ne!(0, hash);
}
}

0 comments on commit fbdba7b

Please sign in to comment.