-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(profiling): datadog-alloc based StringTable #404
Changes from 7 commits
35983dc
728167f
4563e52
f6dfe39
12b9764
52a2b57
84355ee
b48fbbc
26e1246
81c3263
a5d8b29
2b792af
0d4bfba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
// Copyright 2024-Present Datadog, Inc. https://www.datadoghq.com/ | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
use criterion::*; | ||
use datadog_profiling::collections::string_table::wordpress_test_data::WORDPRESS_STRINGS; | ||
|
||
/// This version is the one we used before having datadog-alloc. | ||
#[allow(unused)] | ||
mod old_version { | ||
use datadog_profiling::collections::identifiable::{FxIndexSet, Id, StringId}; | ||
pub struct StringTable { | ||
strings: FxIndexSet<Box<str>>, | ||
} | ||
|
||
impl StringTable { | ||
pub fn new() -> Self { | ||
let mut strings = FxIndexSet::<Box<str>>::default(); | ||
strings.insert("".into()); | ||
Self { strings } | ||
} | ||
|
||
pub fn intern(&mut self, item: &str) -> StringId { | ||
// For performance, delay converting the [&str] to a [String] until | ||
// after it has been determined to not exist in the set. This avoids | ||
// temporary allocations. | ||
let index = match self.strings.get_index_of(item) { | ||
Some(index) => index, | ||
None => { | ||
let (index, _inserted) = self.strings.insert_full(item.into()); | ||
debug_assert!(_inserted); | ||
index | ||
} | ||
}; | ||
StringId::from_offset(index) | ||
} | ||
|
||
#[inline] | ||
#[allow(clippy::len_without_is_empty)] | ||
pub fn len(&self) -> usize { | ||
self.strings.len() | ||
} | ||
} | ||
} | ||
|
||
// To benchmark a different implementation, import a different one. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make this a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When you are benchmarking you generally make tweaks to things and run it again. It's not just a static A/B thing we're going to run in CI so you want to automate swapping it out.. So it seems weird to have a cfg for this reason. I think to use the cfg without making code changes, that would mean making a feature for it. Having a feature just for a benchmarking thing seems odd. Plus it breaks the idea of feature unification (features should be additive). Do these things make sense? Thoughts? |
||
use datadog_profiling::collections::string_table::StringTable; | ||
// use old_version::StringTable; | ||
|
||
pub fn small_wordpress_profile(c: &mut Criterion) { | ||
c.bench_function("benching string interning on wordpress profile", |b| { | ||
b.iter(|| { | ||
let mut table = StringTable::new(); | ||
let n_strings = WORDPRESS_STRINGS.len(); | ||
for string in WORDPRESS_STRINGS { | ||
black_box(table.intern(string)); | ||
} | ||
assert_eq!(n_strings, table.len()); | ||
|
||
// re-insert, should nothing should be inserted. | ||
for string in WORDPRESS_STRINGS { | ||
black_box(table.intern(string)); | ||
} | ||
assert_eq!(n_strings, table.len()) | ||
}) | ||
}); | ||
} | ||
Comment on lines
+49
to
+66
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm curious: did you get a chance to compare this with the previous implementation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't yet, but intend to! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, some numbers! Changing from the old implementation to the new one resulted in:
And then changing it back resulted in:
This is unsurprising, because we spend less time in allocators, but I'm glad to have some data! |
||
|
||
criterion_group!(benches, small_wordpress_profile); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
// Copyright 2024-Present Datadog, Inc. https://www.datadoghq.com/ | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
use criterion::criterion_main; | ||
|
||
mod interning_strings; | ||
|
||
criterion_main!(interning_strings::benches); |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,3 +2,4 @@ | |
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
pub mod identifiable; | ||
pub mod string_table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: We could maybe turn this comment into an assertion, e.g.
assert!(remaining_capacity < 406);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing that would make it less resilient to implementation changes. What if the
Global
allocator starts over-sizing allocations too?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we learn about it through our tests and update them accordingly? ;)