Skip to content

Commit

Permalink
add writeup for 2.1
Browse files Browse the repository at this point in the history
Signed-off-by: Alex Chi <[email protected]>
  • Loading branch information
skyzh committed Jan 22, 2024
1 parent f0c0da8 commit d694f8f
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 4 deletions.
71 changes: 68 additions & 3 deletions mini-lsm-book/src/week2-01-compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,76 @@ In this chapter, you will:

## Task 1: Compaction Implementation

## Task 2: Update the LSM State
In this task, you will implement the core logic of doing a compaction -- merge sort a set of SST files into a sorted run. You will need to modify:

## Task 3: Concat Iterator
```
src/compact.rs
```

## Task 4: Integrate with the Read Path
Specifically, the `force_full_compaction` and `compact` function. `force_full_compaction` is the compaction trigger the decides which files to compact and update the LSM state. `compact` does the actual compaction job that merges some SST files and return a set of new SST files.

Your compaction implementation should take all SSTs in the storage engine, do a merge over them by using `MergeIterator`, and then use the SST builder to write the result into new files. You will need to split the SST files if the file is too large. After compaction completes, you can update the LSM state to add all the new sorted run to the first level of the LSM tree. And, you will need to remove unused files in the LSM tree. In your implementation, your SSTs should only be stored in two places: the L0 SSTs and the first level SSTs. That is to say, the `levels` structure in the LSM state should only have one vector.

Compaction should not block L0 flush, and therefore you should not take the state lock when merging the files. You should only take the state lock at the end of the compaction process when you update the LSM state.

You can assume that the user will ensure there is only one compaction going on. `force_full_compaction` will be called in only one thread at any time. The SSTs being put in the level 1 should be sorted by their first key and should not have overlapping key ranges.

<details>

<summary>Spoilers: Compaction Pseudo Code</summary>

```rust,no_run
fn force_full_compaction(&self) {
let ssts_to_compact = {
let state = self.state.read();
state.l0_sstables + state.levels[0]
};
let new_ssts = self.compact(FullCompactionTask(ssts_to_compact))?;
{
let state_lock = self.state_lock.lock();
let state = self.state.write();
state.l0_sstables.remove(/* the ones being compacted */);
state.levels[0] = new_ssts;
};
std::fs::remove(ssts_to_compact)?;
}
```

</details>

In your compaction implementation, you only need to handle `FullCompaction` for now, where the task information contains the SSTs that you will need to compact. You will also need to ensure the order of the SSTs are correct so that the latest version of a key will be put into the new SST.

Because we always compact all SSTs, if we find multiple version of a key, we can simply retain the latest one. If the latest version is a delete marker, we do not need to keep it in the produced SST files. This does not apply for the compaction strategies in the next few chapters.

There are some niches that you might need to think about. For example,

* How does your implementation handle L0 flush in par with compaction? (Not taking the state lock when doing the compaction, and also need to consider new L0 files produced when compaction is going on.)
* If your implementation removes the original SST files immediately after the compaction completes, will it cause problems in your system? (Generally no on macOS/Linux because the OS will not actually remove the file until no file handle is being held.)

## Task 2: Concat Iterator

In this task, you will need to modify,

```
src/iterators/concat.rs
```

Now that you have created sorted runs in your system, it is possible to do a simple optimization over the read path. You do not always need to create merge iterators for your SSTs. If SSTs belong to one sorted run, you can create a concat iterator that simply iterates the keys in each SST in order, because SSTs in one sorted run do not contain overlapping key ranges and they are sorted by their first key.

## Task 3: Integrate with the Read Path

In this task, you will need to modify,

```
src/lsm_iterator.rs
src/lsm_storage.rs
```

Now that we have the two-level structure for your LSM tree, and you can change your read path to use the new concat iterator to optimize the read path.

You will need to change the inner iterator type of the `LsmStorageIterator`. After that, you can construct a two merge iterator that merges memtables and L0 SSTs, and another merge iterator that merges that iterator with the L1 concat iterator.

You will need to implement `num_active_iterators` for concat iterator so that the test case can test if concat iterators are being used by your implementation, and it should always be 1.

## Test Your Understanding

Expand Down
2 changes: 1 addition & 1 deletion mini-lsm/src/compact.rs
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ impl LsmStorageInner {
state.clone()
};
let mut original_sstables = snapshot.l0_sstables.clone();
original_sstables.reverse();
original_sstables.reverse(); // is this correct?
let sstables = self.compact(&CompactionTask::ForceFullCompaction(
original_sstables.clone(),
))?;
Expand Down

0 comments on commit d694f8f

Please sign in to comment.