Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of hash functions (k) should be floored, not ceiled #45

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions benches/bench.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#![feature(test)]

extern crate test;

use bloomfilter::Bloom;

/* Set benchmarks */

fn inner_insert_bench(b: &mut test::Bencher, bitmap_size: usize, items_count: usize) {
let mut bf: Bloom<usize> = Bloom::new(bitmap_size / 8, items_count);
let mut index = items_count;
b.iter(|| {
index += 1;
test::black_box(bf.set(&index));
});
}

#[bench]
#[inline(always)]
fn bench_insert_100(b: &mut test::Bencher) {
inner_insert_bench(b, 1000, 100);
}


#[bench]
#[inline(always)]
fn bench_insert_1000(b: &mut test::Bencher) {
inner_insert_bench(b, 10000, 1000);
}

#[bench]
#[inline(always)]
fn bench_insert_m_1(b: &mut test::Bencher) {
inner_insert_bench(b, 10_000_000, 1_000_000);
}

#[bench]
#[inline(always)]
fn bench_insert_m_10(b: &mut test::Bencher) {
inner_insert_bench(b, 100_000_000, 10_000_000);
}

/* Get benchmarks */

fn inner_get_bench(b: &mut test::Bencher, bitmap_size: usize, items_count: usize) {
let mut bf: Bloom<usize> = Bloom::new(bitmap_size / 8, items_count);
for index in 0..items_count {
bf.set(&index);
}
let mut index = items_count;
b.iter(|| {
index += 1;
test::black_box(bf.check(&index));
});
}


#[bench]
#[inline(always)]
fn bench_get_100(b: &mut test::Bencher) {
inner_get_bench(b, 1000, 100);
}


#[bench]
#[inline(always)]
fn bench_get_1000(b: &mut test::Bencher) {
inner_get_bench(b, 10000, 1000);
}


#[bench]
#[inline(always)]
fn bench_get_m_1(b: &mut test::Bencher) {
inner_get_bench(b, 10_000_000, 1_000_000);
}

#[bench]
#[inline(always)]
fn bench_get_m_10(b: &mut test::Bencher) {
inner_get_bench(b, 100_000_000, 10_000_000);
}
3 changes: 2 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
use std::f64;
use std::hash::{Hash, Hasher};
use std::marker::PhantomData;
use std::cmp::max;

use bit_vec::BitVec;
#[cfg(feature = "random")]
Expand Down Expand Up @@ -220,7 +221,7 @@
fn optimal_k_num(bitmap_bits: u64, items_count: usize) -> u32 {
let m = bitmap_bits as f64;
let n = items_count as f64;
let k_num = (m / n * f64::ln(2.0f64)).ceil() as u32;
let k_num = max((m / n * f64::ln(2.0f64)).floor() as u32, 1);
cmp::max(k_num, 1)
}

Expand All @@ -244,7 +245,7 @@
pub fn clear(&mut self) {
self.bit_vec.clear()
}

Check warning on line 248 in src/lib.rs

View workflow job for this annotation

GitHub Actions / test (beta)

unnecessary qualification

Check warning on line 248 in src/lib.rs

View workflow job for this annotation

GitHub Actions / Check

unnecessary qualification

Check warning on line 248 in src/lib.rs

View workflow job for this annotation

GitHub Actions / test (stable)

unnecessary qualification

Check warning on line 248 in src/lib.rs

View workflow job for this annotation

GitHub Actions / test (windows)

unnecessary qualification

Check warning on line 248 in src/lib.rs

View workflow job for this annotation

GitHub Actions / test (macos)

unnecessary qualification
/// Set all of the bits in the filter, making it appear like every key is in the set
pub fn fill(&mut self) {
self.bit_vec.set_all()
Expand Down
21 changes: 21 additions & 0 deletions tests/bloom.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,24 @@ fn bloom_test_load() {
);
assert!(cloned.check(&k));
}

/// Test the false positive rate of the bloom filter
/// to ensure that using floor doesn't affect false positive rate
/// in a significant way
#[test]
fn test_false_positive_rate() {
let capacities = [100, 1000, 10000, 100000, 1000000];
for capacity in capacities.iter() {
let mut bf: Bloom<usize> = Bloom::new(*capacity * 10 / 8, *capacity);
for index in 0..*capacity {
bf.set(&index);
}
let mut false_positives_count = 0.0;
for index in *capacity..11 * *capacity {
if bf.check(&index) {
false_positives_count += 1.0;
}
}
println!("False positive rate for capacity {}: {}", *capacity, false_positives_count / (10.0 * *capacity as f64));
}
}
Loading