You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If append_record is not "polled to its end", for instance if it is wrapped in tokio timeout, or if the task running it is cancelled,
then we can end up in a corrupted state.
The code looks as follows
- get next record position by looking at the last record in RAM.
- write on disk
- (C) write on RAM
If the task is stopped in the middle of (C), we end up in a state where what is on disk does not match what is in RAM.
In particular, on the next add, we will use a record position that might actually already be on disk.
As we reload the mrecordlog from disk, this is identified as a corruption.
This has been observed in prod.
A second case also observed is a straight panic.
Here the cause the preemption is assumed to have happened after we appended the record metas
and before we had populated the concatenated_records rolling buffer.
2024-02-28T23:41:54Z app[7816406b969758] iad [info]thread 'tokio-runtime-worker' panicked at /usr/local/cargo/git/checkouts/mrecordlog-34aad39ce3e0e659/bc6a998/src/mem/queue.rs:87:46:
2024-02-28T23:41:54Z app[7816406b969758] iad [info]slice index starts at 928 but ends at 0
The text was updated successfully, but these errors were encountered:
fulmicoton
changed the title
mrecordlog can end up in a corrupted state if its tasks is not polled
Append is not cancel safe
Feb 29, 2024
If
append_record
is not "polled to its end", for instance if it is wrapped in tokio timeout, or if the task running it is cancelled,then we can end up in a corrupted state.
The code looks as follows
If the task is stopped in the middle of (C), we end up in a state where what is on disk does not match what is in RAM.
In particular, on the next add, we will use a record position that might actually already be on disk.
As we reload the mrecordlog from disk, this is identified as a corruption.
This has been observed in prod.
A second case also observed is a straight panic.
Here the cause the preemption is assumed to have happened after we appended the record metas
and before we had populated the
concatenated_records
rolling buffer.The panic reported is
The text was updated successfully, but these errors were encountered: