Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(profiling): datadog-alloc based StringTable #404

Merged
merged 13 commits into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

51 changes: 38 additions & 13 deletions alloc/src/chain.rs
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,15 @@ impl<A: Allocator + Clone> ChainAllocator<A> {

#[cold]
#[inline(never)]
fn grow(&self) -> Result<(), AllocError> {
fn grow(&self, min_size: usize) -> Result<(), AllocError> {
let top = self.top.get();
let chain_layout = Layout::new::<ChainNode<A>>();

let node_size = min_size.max(self.node_size);
let linear = {
let layout = Layout::from_size_align(self.node_size, chain_layout.align())
.map_err(|_| AllocError)?;
let layout = Layout::from_size_align(node_size, chain_layout.align())
.map_err(|_| AllocError)?
.pad_to_align();
LinearAllocator::new_in(layout, self.allocator.clone())?
};

Expand Down Expand Up @@ -214,16 +216,14 @@ unsafe impl<A: Allocator + Clone> Allocator for ChainAllocator<A> {
fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
let layout = layout.pad_to_align();

// Too large for ChainAllocator to deal with.
let header_overhead = size_of::<ChainNode<A>>();
let maximum_capacity = self.node_size - header_overhead;
if layout.size() > maximum_capacity {
return Err(AllocError);
}

let remaining_capacity = self.remaining_capacity();
if layout.size() > remaining_capacity {
self.grow()?;
let header_overhead = size_of::<ChainNode<A>>();
let min_size = layout
.size()
.checked_add(header_overhead)
.ok_or(AllocError)?;
self.grow(min_size)?;
}

// At this point:
Expand Down Expand Up @@ -298,6 +298,32 @@ mod tests {
unsafe { allocator.deallocate(ptr.cast(), layout) };
}

#[test]
fn test_large_allocations() {
let allocator = ChainAllocator::new_in(4096, Global);

// Force an allocation, so it makes a chunk of the minimum size.
{
let ptr = allocator.allocate(Layout::new::<u8>()).unwrap();
unsafe { allocator.deallocate(ptr.cast(), Layout::new::<u8>()) };
}
// Should be a bit less than 4096, but use this over hard-coding a
// number, to make it more resilient to implementation changes.
let remaining_capacity = allocator.remaining_capacity();
Comment on lines +310 to +312
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We could maybe turn this comment into an assertion, e.g. assert!(remaining_capacity < 406);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing that would make it less resilient to implementation changes. What if the Global allocator starts over-sizing allocations too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we learn about it through our tests and update them accordingly? ;)


// Now make something bigger than the chunk.
let size = 4 * (remaining_capacity + 1);
let layout = Layout::from_size_align(size, 1).unwrap();
let ptr = allocator.allocate(layout).unwrap();
let actual_size = ptr.len();
assert!(
actual_size >= size,
"failed to allocate large allocation, expected at least {size} bytes, saw {actual_size}"
);
// Doesn't return memory, just ensuring we don't panic.
unsafe { allocator.deallocate(ptr.cast(), layout) };
}

#[track_caller]
fn fill_to_capacity<A: Allocator + Clone>(allocator: &ChainAllocator<A>) {
let remaining_capacity = allocator.remaining_capacity();
Expand All @@ -306,9 +332,8 @@ mod tests {
let ptr = allocator.allocate(layout).unwrap();
// Doesn't return memory, just ensuring we don't panic.
unsafe { allocator.deallocate(ptr.cast(), layout) };
assert_eq!(0, allocator.remaining_capacity());
}
let remaining_capacity = allocator.remaining_capacity();
assert_eq!(0, remaining_capacity);
}

#[test]
Expand Down
9 changes: 9 additions & 0 deletions profiling/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,22 @@ license.workspace = true
[lib]
crate-type = ["lib"]

[[bench]]
name = "main"
harness = false

[dependencies]
anyhow = "1.0"
bitmaps = "3.2.0"
bytes = "1.1"
chrono = {version = "0.4", default-features = false, features = ["std", "clock"]}
datadog-alloc = {path = "../alloc"}
ddcommon = {path = "../ddcommon"}
derivative = "2.2.0"
futures = { version = "0.3", default-features = false }
futures-core = {version = "0.3.0", default-features = false}
futures-util = {version = "0.3.0", default-features = false}
hashbrown = { version = "0.14", default-features = false, features = ["allocator-api2"] }
http = "0.2"
http-body = "0.4"
hyper = {version = "0.14", features = ["client"], default-features = false}
Expand All @@ -38,3 +44,6 @@ serde_json = {version = "1.0"}
tokio = {version = "1.23", features = ["rt", "macros"]}
tokio-util = "0.7.1"
byteorder = { version = "1.5", features = ["std"] }

[dev-dependencies]
criterion = "0.5.1"
68 changes: 68 additions & 0 deletions profiling/benches/interning_strings.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
// Copyright 2024-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

use criterion::*;
use datadog_profiling::collections::string_table::wordpress_test_data::WORDPRESS_STRINGS;

/// This version is the one we used before having datadog-alloc.
#[allow(unused)]
mod old_version {
use datadog_profiling::collections::identifiable::{FxIndexSet, Id, StringId};
pub struct StringTable {
strings: FxIndexSet<Box<str>>,
}

impl StringTable {
pub fn new() -> Self {
let mut strings = FxIndexSet::<Box<str>>::default();
strings.insert("".into());
Self { strings }
}

pub fn intern(&mut self, item: &str) -> StringId {
// For performance, delay converting the [&str] to a [String] until
// after it has been determined to not exist in the set. This avoids
// temporary allocations.
let index = match self.strings.get_index_of(item) {
Some(index) => index,
None => {
let (index, _inserted) = self.strings.insert_full(item.into());
debug_assert!(_inserted);
index
}
};
StringId::from_offset(index)
}

#[inline]
#[allow(clippy::len_without_is_empty)]
pub fn len(&self) -> usize {
self.strings.len()
}
}
}

// To benchmark a different implementation, import a different one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a cfg rather than commenting out a line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you are benchmarking you generally make tweaks to things and run it again. It's not just a static A/B thing we're going to run in CI so you want to automate swapping it out.. So it seems weird to have a cfg for this reason.

I think to use the cfg without making code changes, that would mean making a feature for it. Having a feature just for a benchmarking thing seems odd. Plus it breaks the idea of feature unification (features should be additive).

Do these things make sense? Thoughts?

use datadog_profiling::collections::string_table::StringTable;
// use old_version::StringTable;

pub fn small_wordpress_profile(c: &mut Criterion) {
c.bench_function("benching string interning on wordpress profile", |b| {
b.iter(|| {
let mut table = StringTable::new();
let n_strings = WORDPRESS_STRINGS.len();
for string in WORDPRESS_STRINGS {
black_box(table.intern(string));
}
assert_eq!(n_strings, table.len());

// re-insert, should nothing should be inserted.
for string in WORDPRESS_STRINGS {
black_box(table.intern(string));
}
assert_eq!(n_strings, table.len())
})
});
}
Comment on lines +49 to +66
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: did you get a chance to compare this with the previous implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet, but intend to!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, some numbers! Changing from the old implementation to the new one resulted in:

benching string interning on wordpress profile
                        time:   [53.695 µs 53.977 µs 54.279 µs]
                        change: [-28.080% -27.464% -26.850%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe

And then changing it back resulted in:

benching string interning on wordpress profile
                        time:   [75.224 µs 75.501 µs 75.820 µs]
                        change: [+40.269% +41.386% +42.503%] (p = 0.00 < 0.05)
                        Performance has regressed.

This is unsurprising, because we spend less time in allocators, but I'm glad to have some data!


criterion_group!(benches, small_wordpress_profile);
8 changes: 8 additions & 0 deletions profiling/benches/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// Copyright 2024-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

use criterion::criterion_main;

mod interning_strings;

criterion_main!(interning_strings::benches);
1 change: 1 addition & 0 deletions profiling/src/collections/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
// SPDX-License-Identifier: Apache-2.0

pub mod identifiable;
pub mod string_table;
Loading
Loading