Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support memory freeing #236

Draft
wants to merge 22 commits into
base: EXC-1404-memory-manager-support-freeing-a-memory
Choose a base branch
from

Conversation

hpeebles
Copy link
Contributor

WIP

@hpeebles hpeebles changed the base branch from main to EXC-1404-memory-manager-support-freeing-a-memory November 11, 2024 15:29
@hpeebles hpeebles changed the base branch from EXC-1404-memory-manager-support-freeing-a-memory to main November 11, 2024 15:34
@dragoljub-duric dragoljub-duric changed the base branch from main to EXC-1404-memory-manager-support-freeing-a-memory November 11, 2024 15:44
@dragoljub-duric dragoljub-duric changed the base branch from EXC-1404-memory-manager-support-freeing-a-memory to main November 11, 2024 15:45
@hpeebles hpeebles changed the base branch from main to EXC-1404-memory-manager-support-freeing-a-memory November 11, 2024 21:47
@@ -294,6 +291,8 @@ impl<M: Memory> Memory for VirtualMemory<M> {
struct MemoryManagerInner<M: Memory> {
memory: M,

version: u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason regardless of whether in storage we have an old or new version, when loading we will load it in V2 always, so when saving we can be sure we are saving a new version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only used in the upgrade_from_v1_to_v2 test to ensure the correct version of grow is called.
I'll wrap it in a cfg(test).

Comment on lines 824 to 943
BucketId(self.inner.first_bucket_per_memory[memory_id.0 as usize])
}

// Sets the first bucket assigned to a memory
fn set_first(&mut self, memory_id: MemoryId, value: BucketId) {
self.inner.first_bucket_per_memory[memory_id.0 as usize] = value.0;
self.dirty_first_buckets.insert(memory_id);
}

// Gets the next bucket in the linked list of buckets
fn get_next(&self, bucket: BucketId) -> BucketId {
let start_bit_index = bucket.0 as usize * 15;
let mut next_bits: BitArray<[u8; 2]> = BitArray::new([0u8; 2]);

for i in 0..15 {
next_bits.set(
i + 1,
self.inner
.bucket_links
.get(start_bit_index + i)
.unwrap()
.as_bool(),
);
}

BucketId(u16::from_be_bytes(next_bits.data))
}

// Sets the next bucket in the linked list of buckets
fn set_next(&mut self, bucket: BucketId, next: BucketId) {
let start_bit_index = bucket.0 as usize * 15;
let next_bits: BitArray<[u8; 2]> = BitArray::from(next.0.to_be_bytes());

for (index, bit) in next_bits.iter().skip(1).enumerate() {
self.inner
.bucket_links
.set(start_bit_index + index, bit.as_bool());
}

let start_byte_index = start_bit_index / 8;
let end_byte_index = (start_bit_index + 14) / 8;

self.dirty_bucket_link_bytes
.extend((start_byte_index..=end_byte_index).map(|i| i as u16))
}

// Calculates the buckets for a given memory by iterating over its linked list
fn buckets_for_memory(&self, memory_id: MemoryId, count: u16) -> Vec<BucketId> {
if count == 0 {
return Vec::new();
}

let mut bucket = self.get_first(memory_id);
let mut buckets = vec![bucket];
for _ in 1..count {
bucket = self.get_next(bucket);
buckets.push(bucket);
}
buckets
}

// Flushes only the dirty bytes to memory
fn flush_dirty_bytes<M: Memory>(&mut self, memory: &M, start_offset: u64) {
const FIRST_BUCKET_PER_MEMORY_LEN: usize = 2 * MAX_NUM_MEMORIES as usize;

if !self.dirty_first_buckets.is_empty() {
// SAFETY: This is safe because we simply cast from [u16] to [u8] and double the length.
let bytes: [u8; FIRST_BUCKET_PER_MEMORY_LEN] =
unsafe { transmute(self.inner.first_bucket_per_memory) };

// Multiply by 2 since we've converted from [u16] to [u8].
let min = 2 * self.dirty_first_buckets.first().unwrap().0 as usize;
let max = 2 * self.dirty_first_buckets.last().unwrap().0 as usize + 1;

let slice = &bytes[min..=max];
write(memory, start_offset + min as u64, slice);
self.dirty_first_buckets.clear();
}

if !self.dirty_bucket_link_bytes.is_empty() {
let min = *self.dirty_bucket_link_bytes.first().unwrap() as usize;
let max = *self.dirty_bucket_link_bytes.last().unwrap() as usize;

let slice = &self.inner.bucket_links.data[min..=max];
write(
memory,
start_offset + (FIRST_BUCKET_PER_MEMORY_LEN + min) as u64,
slice,
);
self.dirty_bucket_link_bytes.clear();
}
}

// Flushes all bytes to memory
fn flush_all<M: Memory>(&mut self, memory: &M, start_offset: u64) {
write_struct(&self.inner, Address::from(start_offset), memory);
self.dirty_first_buckets.clear();
self.dirty_bucket_link_bytes.clear();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer to have only one place where we read the header, and after that, we use u16 (instead of 15-bit representation in storage). I believe that will improve performance, but more importantly, it will be easier to maintain and understand. After that, we can still use information from dirty_first_buckets and dirty_bucket_link_bytes to apply changes in storage on 15-bit representation when saving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense.
This means we'll have to convert from 15 to 16-bit after reading which will have some overhead but at least its only done once. I'll have a go at it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Will likely need a bit of cleaning up. Need to run off now but will give it another look tomorrow.

@ielashi
Copy link
Contributor

ielashi commented Nov 14, 2024

@dragoljub-duric @hpeebles @dsarlis I'm not sure we'd want to merge this solution to support memory freeing.

With the expansion of stable memory, we now have another issue to solve: we have ~32k buckets, and with a default size of 8MiB per bucket, the memory manager can only manage 256GiB of memory. I bet most people don't know this, assuming that they have access to the full 400GiB (or is it 500 now?). This is a foot gun that we should resolve, and resolving this would require increasing the number of buckets.

In order to both increase the number of buckets and to support memory freeing the future, we'll have to update the memory manager in such a way that its header can span, say, two wasm pages, as opposed to one wasm page. And, if we have two wasm pages, then we won't need to represent buckets using 15 bits as we're doing in this PR, which is adding more complication than it's worth imho.

@hpeebles
Copy link
Contributor Author

hpeebles commented Nov 14, 2024

We could keep the header as a single page, store each bucket in the linked list as 3 bytes (which could support up to 16777216 buckets), and if additional buckets are needed, we use the unused space in the header to point to the page where the subsequent buckets are defined.

Then on each page of buckets is a pointer to the next page.

This way we can support far more buckets than anyone would ever need, the header remains small in most use cases, and we wouldn't need to move the existing data in the first memory page after the header.

@hpeebles
Copy link
Contributor Author

hpeebles commented Nov 14, 2024

Having thought about this a bit more I think my idea to store bucketIds as 3 bytes is wrong.
If we support spreading buckets onto more pages, then storing them as 3 bytes is an unnecessary storage optimisation at the expense of code complexity. I think storing buckets as u32s is the way to go.
The first page could then store something like 8192 buckets in the last 32768 bytes (using the format I introduced in this PR where the entry at each bucket's index points to the next bucket in the linked list).
Follow on pages used to store buckets could then store 16384 per page.
The first page then just needs to somehow store where the additional pages are.
If my calculations are correct then using the format above we'd have 30178 bytes available within the first page to encode the additional bucket pages and any metadata we may need on each one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants