Append is not cancel safe #52

fulmicoton · 2024-02-29T01:17:41Z

If append_record is not "polled to its end", for instance if it is wrapped in tokio timeout, or if the task running it is cancelled,
then we can end up in a corrupted state.

The code looks as follows

- get next record position by looking at the last record in RAM.
- write on disk
- (C) write on RAM

If the task is stopped in the middle of (C), we end up in a state where what is on disk does not match what is in RAM.
In particular, on the next add, we will use a record position that might actually already be on disk.

As we reload the mrecordlog from disk, this is identified as a corruption.
This has been observed in prod.

A second case also observed is a straight panic.
Here the cause the preemption is assumed to have happened after we appended the record metas
and before we had populated the concatenated_records rolling buffer.

        self.record_metas.push(record_meta);
        self.concatenated_records.extend(payload);

The panic reported is

2024-02-28T23:41:54Z app[7816406b969758] iad [info]thread 'tokio-runtime-worker' panicked at /usr/local/cargo/git/checkouts/mrecordlog-34aad39ce3e0e659/bc6a998/src/mem/queue.rs:87:46:
2024-02-28T23:41:54Z app[7816406b969758] iad [info]slice index starts at 928 but ends at 0

The text was updated successfully, but these errors were encountered:

fulmicoton changed the title ~~mrecordlog can end up in a corrupted state if its tasks is not polled~~ Append is not cancel safe Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Append is not cancel safe #52

Append is not cancel safe #52

fulmicoton commented Feb 29, 2024

Append is not cancel safe #52

Append is not cancel safe #52

Comments

fulmicoton commented Feb 29, 2024