You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When server commits transactions involving binlog, MyRocks, and possibly InnoDB, it uses group 2PC protocol. During its prepare step, the server asks the storage engines not to flush/sync individual transactions, which MyRocks complies with:
But, after this MyRocks fails to re-enable sync for the commit step of the same transaction, effectively committing without durability guarantee. Part of the commit payload (as opposed to the prepare payload) is the binlog position. Since this is 2PC across multiple engines, the exact effect depends on what was the transaction:
MyRocks DDL (involves binlog, MyRocks, and InnoDB):
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=ROCKSDB;
--source include/kill_and_restart_mysqld.inc
DROP TABLE t1;
Appears to be fine, in recovery all SEs agree on the binlog position, XA recovery does nothing.
MyRocks DML (involves binlog, MyRocks, may or may not involve InnoDB):
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=ROCKSDB;
BEGIN;
INSERT INTO t1 VALUES (1);
COMMIT;
--source include/kill_and_restart_mysqld.inc
DROP TABLE t1;
In recovery all SEs agree on the binlog position, but XA recovery commits one transaction.
Multi-engine DML (involves binlog, MyRocks, and InnoDB):
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=ROCKSDB;
CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=InnoDB;
BEGIN;
INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (2);
COMMIT;
--source include/kill_and_restart_mysqld.inc
SELECT * FROM t1;
SELECT * FROM t2;
DROP TABLE t1, t2;
In recovery the binlog position in MyRocks is outdated, XA recovery commits one transaction.
In all cases the binlog-based XA recovery is able to clean up. However, if the server data dictionary is moved to MyRocks, the bug becomes much more serious data-corrupting one when InnoDB DDL is involved, because during startup MyRocks recovers individually from its WAL, DD is initialized without the last committed transaction, then InnoDB attempts to recover using the outdated DD, and only then binlog XA recovery runs, which is too late.
I'll make a PR
The text was updated successfully, but these errors were encountered:
It looks like this behavior is an undocumented deliberate design decision.The server layer requests commits to be non-durable too in addition to prepares, when binlog is enabled. InnoDB does the same thing: https://bugs.mysql.com/bug.php?id=75519
Since InnoDB is the DDSE, it also unconditionally flushes the redo log on any DD operation. This way it addresses the scenario above where it recovers individually from the redo log, initializes the data dictionary, and only then binlog crash recovery runs. It looks like MyRocks DDSE will have to flush on every DD operation commit too.
I will still make a PR but only to check the server durability request individually at commit time too, instead of assuming that prepare and commit requested the same.
When server commits transactions involving binlog, MyRocks, and possibly InnoDB, it uses group 2PC protocol. During its prepare step, the server asks the storage engines not to flush/sync individual transactions, which MyRocks complies with:
But, after this MyRocks fails to re-enable sync for the commit step of the same transaction, effectively committing without durability guarantee. Part of the commit payload (as opposed to the prepare payload) is the binlog position. Since this is 2PC across multiple engines, the exact effect depends on what was the transaction:
Appears to be fine, in recovery all SEs agree on the binlog position, XA recovery does nothing.
In recovery all SEs agree on the binlog position, but XA recovery commits one transaction.
In recovery the binlog position in MyRocks is outdated, XA recovery commits one transaction.
In all cases the binlog-based XA recovery is able to clean up. However, if the server data dictionary is moved to MyRocks, the bug becomes much more serious data-corrupting one when InnoDB DDL is involved, because during startup MyRocks recovers individually from its WAL, DD is initialized without the last committed transaction, then InnoDB attempts to recover using the outdated DD, and only then binlog XA recovery runs, which is too late.
I'll make a PR
The text was updated successfully, but these errors were encountered: