sql: use WAL mode and set pragmas correctly #98

moio · 2024-08-30T15:04:54Z

This

sets PRAGMAs correctly, before they were ignored as they used a wrong syntax
removes the shared=cache option as sql: drop the attachdriver mechanism #97 removed the need for it
removes the mechanism retrying on SQLITE_BUSY, which is then unnecessary
adds code to issue BEGIN on read-only transactions and BEGIN IMMEDIATE for transactions expected to write

The last point is necessary in order to avoid SQLITE_BUSY (5) errors in presence of multiple concurrent writing transactions. Without it busy_timeout would not be sufficient, see discussion below.

best reviewed commit-by-commit

Benchmark notes

Results from a quick and rough benchmark on my dev environment look good - performance is on par or superior compared to the situation without this PR.

This test lists the first page of ConfigMaps with a varying number of ConfigMaps: 0 up to 6000. Also, a number of ConfigMaps are changed (added and deleted) every second: 0 up to 400.

Results are the p(95) of the whole request time:

(Grey is without this PR, green is with this PR. Lower is better)

Note 1: the test at 6000 ConfigMaps failed with a SQLite error after 100 changes per second: this PR also fixes that bug 😇

Note 2: results are also robustly better than running with Vai off (no SQLite pagination):

(Grey is without this PR, green is with this PR, red is with Vai off. Lower is better)

To replicate use:
https://gist.github.com/moio/20a43fa406432cec5fc8aee0394d3e1f

With these scripts from the dartboard project:

https://github.com/rancher/dartboard/blob/main/k6/api_benchmark.js
https://github.com/rancher/dartboard/blob/main/k6/create_k8s_resources.js
https://github.com/rancher/dartboard/blob/main/k6/change_config_maps.js

tomleb

LGTM.

A long comment that is potentially in the wrong issue follows..

busy_timeout doesn't always work as we expect with deferred transactions (the default).

Here's a few examples that all start with the same setup. The prefix X: is the DB connection X:

setup

A: create table T(t text);
A: create table R(r text);
A: insert into T values ('a'), ('b');
A: insert into R values ('a'), ('b');
A: begin;
A: update T set t='x' where t='a';
B: PRAGMA busy_timeout=5000;

select same table

Succeeds (as expected because WAL allows concurrent read+writes).

B: begin;
B: select * from T;
a
b
B: commit;

update different table

Blocks 5 seconds (busy_timeout) and returns an error.

B: begin;
B: update R set r='x' where r='a';
# Blocks for 5 seconds
Runtime error: database is locked (5)

select+update different table

Immediately returns an error. Doesn't respect the busy_timeout set. I can see this case happening with the Replace() method (we're doing both read+write). eg: New Pods event comes in and at the same time, Namespaces resyncs.

Note: Retrying the update will keep returning an error even when A's transaction is committed. Yes, even if A updated a different table.

B: begin;
B: select * from R;
a
b
B: update R set r='x' where r='a';
Runtime error: database is locked (5)
B: update R set r='x' where r='a';
Runtime error: database is locked (5)
A: commit;
B: update R set r='x' where r='a';
Runtime error: database is locked (5)

select+update with immediate

Contrast the above scenario with immediate mode transactions. It basically tries to get a lock immediately instead of on the first write command. The rest of the transaction is then guaranteed to not get database is locked error.

B: begin immediate;
# Blocks for 5 seconds waiting for commit
Runtime error: database is locked (5)
A: commit;
B: begin immediate;
B: select * from R;
a
b
B: update R set r='x' where r='a';
B: commit;

Should we move the retry logic elsewhere rather than removing it OR use immediate transactions OR use a single connection specifically for writes?

moio · 2024-09-12T14:44:53Z

@tomleb I am very glad you brought this up as it gave me the opportunity to learn things I did not know.

"select+update different table" behavior seemed obscure/broken to me for a few days, until I found a tangentially related article that made it clear why it behaves that way:

as it is well-known, SQLite only supports one active "writer" at any point in time. I find "writer" a bit of nebulous shorthand for "writing transaction" (which is definitely a SQLite-specific concept)
in SQLite a transaction is "writing" if it is either started with BEGIN IMMEDIATE or if it already issued a DML statement (INSERT/UPDATE/DELETE). Note that normal (non-IMMEDIATE) transactions start their life as "non-writing" and can be promoted to "writing" at a later point
in any situation, if a "writing" transaction starts (either because it BEGINs IMMEDIATEly or because it got promoted) while another is still in progress, then the SQLITE_BUSY (5) error is produced, that transaction gets into an error state and, like any other errored transaction, it can't be recovered. The only way to proceed is to ROLLBACK and then start with a fresh transaction
because everybody hates rolling back, as a convenience, SQLite implements the busy_timeout PRAGMA. That implements code that conceptually catches SQLITE_BUSY errors, issues ROLLBACKs, waits and BEGINs again up to a specified timeout - all transparently! Pretty convenient.
problem is that busy_timeout cannot work in all cases. Conceptually, rolling back and retrying a transaction is perfectly OK if that transaction only issued BEGIN IMMEDIATE and nothing else (because literally nothing else happened, there is nothing to roll back). busy_timeout is also nice enough to handle transactions that started as non-writer but did nothing but a DML command which immediately ran into SQLITE_BUSY (5) - effectively, those transactions did nothing either. But if a transaction started as "non-writer", SELECTed something, returned data back to application code, and after that it became a writer and got SQLITE_BUSY (5), then there is no safe way a ROLLBACK can possibly be handled from inside SQLite - it has to bubble up to application code, which is why busy_timeout is "not respected".

Fundamentally busy_timeout has to be seen as a mechanism that saves you from the pain of retrying transactions in your code in easy cases, not as a general parameter that is guaranteed to be always respected 😢.

Back to our options:

using immediate transactions exclusively: will not work. Specifically: it will be functional, but there will be no reader/reader or reader/writer parallelism, as we would be effectively limited to one active transaction at any point in time
using a single connection for writes: it will work, but you will need to take care when starting a transaction to use the right connection depending on whether you plan to use it to write or not, and synchronize access, queuing up as necessary. Might be a bit complicated
moving retry logic elsewhere: it will also work - provided we retry whole transactions and not commands inside of a transaction (that is doomed: once a transaction is in an error state, it can't get out of it). Might also not be super-straightforward, but has the advantage of not needing to think when opening a transaction whether it will be read-only or read-write
using BEGIN IMMEDIATE for all writing transactions - potentially the least intrusive fix, and allows to forego all retrying code. Requires to think before BEGINning whether the transaction will write or not.

Unless you disagree or see flaws in the reasoning above, I'd start by trying to go with option 3 in a separate PR and see how it goes, as it seems to minimize the complexity of the finished thing.

tomleb · 2024-09-13T02:16:07Z

@moio thanks for letting me know about this article, it's got really good information!

I'm okay with you trying out option 3.

I'm tempted to try option 2 & 4 together myself and see how that turns out. (Inspired by the rqlite codebase) Naively, I'm thinking the interface is already defined for us so we know what needs read-only and read-write. Though of course as code changes.. I suspect some non-mock test would be able to find issues here such as trying to write to a database using a read-only connection..

moio · 2024-09-13T08:40:41Z

🤦 I wrote option 3. while meaning option 4. - as I added a point mid editing 😮‍💨

Are you still OK with that?

Unrelated: I absolutely do not oppose you going down the option 2. route. In my eyes, once that is done and correctly enqueues write transactions, adding 4. on top is not even needed. Let me know what you decide in the end.

tomleb · 2024-09-13T12:28:04Z

Are you still OK with that?

Yes, definitely.

adding 4. on top is not even needed

Heh you're right, I hadn't thought of that. Once we limit writing to a single connection, then deferred mode cannot fail with database is locked.

Played a bit with option 4

I'm seeing some weird behavior when opening the DB with default transaction mode immediate. BeginTx will not respect busy_timeout although it should. Running BEGIN IMMEDIATE manually does work. So if you're going with option 4 I suggest keeping the default transaction mode of deferred and using BEGIN IMMEDIATE manually. Unless you had a better idea. Here's the snippet describing this:

package main

import (
	"context"
	"database/sql"
	"log"
	"os"
	"os/signal"
	"syscall"
	"time"

	_ "modernc.org/sqlite"
)

func main() {
	if err := mainErr(); err != nil {
		log.Fatal(err)
	}
}

func mainErr() error {
	ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
	defer stop()

	// cache=shared is necessary for in-memory DB
	rwDB, err := sql.Open("sqlite", "file::memory:?cache=shared&_pragma=journal_mode=wal&_txlock=immediate&_pragma=busy_timeout=5000")
	if err != nil {
		return err
	}
	rwDB.SetMaxOpenConns(2)

	conn, _ := rwDB.Conn(ctx)
	tx, err := conn.BeginTx(ctx, nil)
	if err != nil {
		return err
	}

	go func() {
		conn2, _ := rwDB.Conn(ctx)
		// BeginTx will fail with database is locked error even though
		// we set immediate transaction as default
		_, err2 := conn2.BeginTx(ctx, nil)
		// Running BEGIN IMMEDIATE manually gives us the behavior we
		// want: busy_timeout is respected
		//_, err2 := conn2.ExecContext(ctx, "BEGIN IMMEDIATE;")
		if err2 != nil {
			log.Fatal(err2)
		}
	}()
	time.Sleep(10 * time.Second)

	err = tx.Commit()
	if err != nil {
		return err
	}

	return nil
}

modernc's implementation of SQLite uses a slightly different syntax Signed-off-by: Silvio Moioli <[email protected]>

120 seconds, or two consecutive minutes of uninterrupted contention, should be more than sufficient and anything exceeding this value should be reported as error. Signed-off-by: Silvio Moioli <[email protected]>

With busy_timeout properly set, SQLite will take care of the error internally. Moreover with WAL mode properly set, the error should almost never happen to begin with. https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode Signed-off-by: Silvio Moioli <[email protected]>

Signed-off-by: Silvio Moioli <[email protected]>

tomleb

LGTM.

I'm still not sure whether I prefer having 2 separate connection pool (1 read-write, 1 read-only) or using begin immediate. Immediate transaction mode has a footgun (imo) with modernc.org/sqlite (and github.com/mattn/go-sqlite3): busy_timeout is NOT respected when using cache=shared and executing BeginTx, but it IS respected when using cache=shared and executing ExecContext("begin immediate"). We're not using cache=shared though, but if we ever want to use :memory: databases for tests, we will have to use it, which will lead to very different behavior between tests and production.

As I understand it, 2 separate connection pools would avoid this problem since the read-write connection would be blocked by the application instead.

moio · 2024-09-17T14:52:58Z

busy_timeout is NOT respected when using cache=shared and executing BeginTx, but it IS respected when using cache=shared and executing ExecContext("begin immediate")

I am not exactly sure this is actually the case - as BeginTx with the change in https://github.com/rancher/lasso/pull/98/files#diff-28f84babc5eeebb6a449367653bf97f8670e2816ffd8f898eb592dd3819285b0R236 actually calls BEGIN IMMEDIATE so the two should be equivalent.

In any case, I would not recommend going down the :memory: lane for tests - there are enough subtle differences in terms of locking and performance with the file-based approach, and the latter is anyway so fast, I would recommend running integration tests in files for the sake of uniformity and debuggability anyway.

tomleb · 2024-09-17T15:02:07Z

busy_timeout is NOT respected when using cache=shared and executing BeginTx, but it IS respected when using cache=shared and executing ExecContext("begin immediate")

I am not exactly sure this is actually the case - as BeginTx with the change in https://github.com/rancher/lasso/pull/98/files#diff-28f84babc5eeebb6a449367653bf97f8670e2816ffd8f898eb592dd3819285b0R236 actually calls BEGIN IMMEDIATE so the two should be equivalent.

Yes, should but once you actually test it they end up not being equivalent. BeginTx even with _txlock set will call BEGIN IMMEDIATE differently (eg: with different sqlite functions) than calling ExecContext("BEGIN IMMEDIATE"). (And to be clear, I did a few tests and that's the behavior I'm seeing using both modernc.org and mattn/go-sqlite3 libraries)

In any case, I would not recommend going down the :memory: lane for tests - there are enough subtle differences in terms of locking and performance with the file-based approach, and the latter is anyway so fast, I would recommend running integration tests in files for the sake of uniformity and debuggability anyway.

Fair, though we might end up having to be able to set the path to the DB to avoid having multiple tests.

Anyway, nothing to change here, I'm good with the current solution 👍

MbolotSuse

Approving since this has the two required approvers.

This includes various fixes to the UI Server-Side Pagination experimental feature: https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/enable-experimental-features/ui-server-side-pagination - rancher/lasso#86 - rancher/lasso#90 - rancher/lasso#85 - rancher/lasso#92 - rancher/lasso#79 - rancher/lasso#97 - rancher/lasso#94 - rancher/lasso#98 - rancher/lasso#108 - rancher/lasso#99 It also includes improvements to code collecting metrics: - rancher/lasso#89 - rancher/lasso#95 And to test code: - rancher/lasso#100 - rancher/lasso#101 Signed-off-by: Silvio Moioli <[email protected]>

moio requested a review from a team as a code owner August 30, 2024 15:04

moio requested a review from tomleb August 30, 2024 15:05

moio changed the title ~~Use WAL mode and set pragmas correctly~~ sql: use WAL mode and set pragmas correctly Aug 30, 2024

moio mentioned this pull request Aug 30, 2024

sql: allow multiple informers to start in parallel #99

Merged

tomleb previously approved these changes Sep 4, 2024

View reviewed changes

moio added 3 commits September 13, 2024 14:50

Set pragmas correctly

5a8a6df

modernc's implementation of SQLite uses a slightly different syntax Signed-off-by: Silvio Moioli <[email protected]>

Use a reasonable value for busy_timeout

796ad34

120 seconds, or two consecutive minutes of uninterrupted contention, should be more than sufficient and anything exceeding this value should be reported as error. Signed-off-by: Silvio Moioli <[email protected]>

moio dismissed tomleb’s stale review via 510a1fc September 13, 2024 13:03

moio force-pushed the wal_options branch from ca518d1 to 510a1fc Compare September 13, 2024 13:03

tomleb previously approved these changes Sep 13, 2024

View reviewed changes

moio added 4 commits September 13, 2024 17:27

sql: explain SQLite connection options better

b7a7837

Signed-off-by: Silvio Moioli <[email protected]>

sql: use BEGIN IMMEDIATE for transactions expected to write

e5f5819

Signed-off-by: Silvio Moioli <[email protected]>

update tests

6e0e648

Signed-off-by: Silvio Moioli <[email protected]>

update mocks

ce9ffe3

Signed-off-by: Silvio Moioli <[email protected]>

moio dismissed tomleb’s stale review via ce9ffe3 September 13, 2024 15:29

tomleb approved these changes Sep 16, 2024

View reviewed changes

aruiz14 approved these changes Sep 18, 2024

View reviewed changes

moio mentioned this pull request Sep 18, 2024

SQLite backed cache in the Steve API - small things part 2 rancher/rancher#46410

Open

4 tasks

MbolotSuse approved these changes Sep 18, 2024

View reviewed changes

moio merged commit 4704559 into rancher:master Sep 18, 2024
1 check passed

moio deleted the wal_options branch September 18, 2024 13:05

This was referenced Sep 18, 2024

[Vai] Server side pagination - Sync lag of Kubernetes resources counts when quering the counts after resources creation rancher/rancher#46701

Closed

SQLite backed cache: investigate writes blocking reads rancher/rancher#46331

Closed

This was referenced Oct 1, 2024

Bump lasso and steve to include UI Server-Side Pagination fixes rancher/rancher#47329

Merged

Bump lasso and steve to get the latest UI Server-Side Pagination improvements rancher/rancher#47349

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: use WAL mode and set pragmas correctly #98

sql: use WAL mode and set pragmas correctly #98

moio commented Aug 30, 2024 •

edited

Loading

tomleb left a comment

moio commented Sep 12, 2024 •

edited

Loading

tomleb commented Sep 13, 2024

moio commented Sep 13, 2024

tomleb commented Sep 13, 2024 •

edited

Loading

tomleb left a comment

moio commented Sep 17, 2024

tomleb commented Sep 17, 2024

MbolotSuse left a comment

sql: use WAL mode and set pragmas correctly #98

sql: use WAL mode and set pragmas correctly #98

Conversation

moio commented Aug 30, 2024 • edited Loading

Benchmark notes

tomleb left a comment

Choose a reason for hiding this comment

moio commented Sep 12, 2024 • edited Loading

tomleb commented Sep 13, 2024

moio commented Sep 13, 2024

tomleb commented Sep 13, 2024 • edited Loading

tomleb left a comment

Choose a reason for hiding this comment

moio commented Sep 17, 2024

tomleb commented Sep 17, 2024

MbolotSuse left a comment

Choose a reason for hiding this comment

moio commented Aug 30, 2024 •

edited

Loading

moio commented Sep 12, 2024 •

edited

Loading

tomleb commented Sep 13, 2024 •

edited

Loading