From 3858179a8074aa88f7106550dee641f3381318aa Mon Sep 17 00:00:00 2001 From: JmPotato Date: Wed, 18 Sep 2024 11:51:03 +0800 Subject: [PATCH 1/4] Add new resource control action SWITCH_GROUP Signed-off-by: JmPotato --- .../sql-statement-alter-resource-group.md | 1 + .../sql-statement-create-resource-group.md | 1 + sql-statements/sql-statement-query-watch.md | 5 +++++ tidb-resource-control.md | 18 +++++++++++++----- 4 files changed, 20 insertions(+), 5 deletions(-) diff --git a/sql-statements/sql-statement-alter-resource-group.md b/sql-statements/sql-statement-alter-resource-group.md index 8b2f521c9b1ff..a0ebeeabd81c6 100644 --- a/sql-statements/sql-statement-alter-resource-group.md +++ b/sql-statements/sql-statement-alter-resource-group.md @@ -64,6 +64,7 @@ ResourceGroupRunawayActionOption ::= DRYRUN | COOLDOWN | KILL +| "SWITCH_GROUP" '(' ResourceGroupName ')' BackgroundOptionList ::= DirectBackgroundOption diff --git a/sql-statements/sql-statement-create-resource-group.md b/sql-statements/sql-statement-create-resource-group.md index edaa3a5a592b3..edcce62e2c23a 100644 --- a/sql-statements/sql-statement-create-resource-group.md +++ b/sql-statements/sql-statement-create-resource-group.md @@ -68,6 +68,7 @@ ResourceGroupRunawayActionOption ::= DRYRUN | COOLDOWN | KILL +| "SWITCH_GROUP" '(' ResourceGroupName ')' ``` The resource group name parameter (`ResourceGroupName`) must be globally unique. diff --git a/sql-statements/sql-statement-query-watch.md b/sql-statements/sql-statement-query-watch.md index 545ad1bfbb80f..b3c67ef28074d 100644 --- a/sql-statements/sql-statement-query-watch.md +++ b/sql-statements/sql-statement-query-watch.md @@ -28,6 +28,11 @@ QueryWatchOption ::= ResourceGroupName ::= Identifier | "DEFAULT" +ResourceGroupRunawayActionOption ::= + DRYRUN +| COOLDOWN +| KILL +| "SWITCH_GROUP" '(' ResourceGroupName ')' QueryWatchTextOption ::= "SQL" "DIGEST" SimpleExpr | "PLAN" "DIGEST" SimpleExpr diff --git a/tidb-resource-control.md b/tidb-resource-control.md index 547228739ba2b..2f681173d0e65 100644 --- a/tidb-resource-control.md +++ b/tidb-resource-control.md @@ -278,6 +278,7 @@ Supported operations (`ACTION`): - `DRYRUN`: no action is taken. The records are appended for the runaway queries. This is mainly used to observe whether the condition setting is reasonable. - `COOLDOWN`: the execution priority of the query is lowered to the lowest level. The query continues to execute with the lowest priority and does not occupy resources of other operations. - `KILL`: the identified query is automatically terminated and reports an error `Query execution was interrupted, identified as runaway query`. +- `SWITCH_GROUP`: switches the identified query to the specified resource group for continued execution. This option is introduced in v8.4.0. To avoid too many concurrent runaway queries that exhaust system resources, the resource control feature introduces a quick identification mechanism, which can quickly identify and isolate runaway queries. You can use this feature through the `WATCH` clause. When a query is identified as a runaway query, this mechanism extracts the matching feature (defined by the parameter after `WATCH`) of the query. In the next period of time (defined by `DURATION`), the matching feature of the runaway query is added to the watch list, and the TiDB instance matches queries with the watch list. The matching queries are directly marked as runaway queries and isolated according to the corresponding action, instead of waiting for them to be identified by conditions. The `KILL` operation terminates the query and reports an error `Quarantined and interrupted because of being in runaway watch list`. @@ -296,19 +297,20 @@ The parameters of `QUERY_LIMIT` are as follows: | Parameter | Description | Note | |---------------|--------------|--------------------------------------| | `EXEC_ELAPSED` | When the query execution time exceeds this value, it is identified as a runaway query | EXEC_ELAPSED =`60s` means the query is identified as a runaway query if it takes more than 60 seconds to execute. | -| `ACTION` | Action taken when a runaway query is identified | The optional values are `DRYRUN`, `COOLDOWN`, and `KILL`. | +| `ACTION` | Action taken when a runaway query is identified | The optional values are `DRYRUN`, `COOLDOWN`, `KILL`, and `SWITCH_GROUP`. | | `WATCH` | Quickly match the identified runaway query. If the same or similar query is encountered again within a certain period of time, the corresponding action is performed immediately. | Optional. For example, `WATCH=SIMILAR DURATION '60s'`, `WATCH=EXACT DURATION '1m'`, and `WATCH=PLAN`. | +> **Note:** +> +> It is recommended to use the `SWITCH_GROUP` statement together with the [`QUERY WATCH`](/tidb-resource-control.md#query-watch-parameters) statement. Because `QUERY_LIMIT` only triggers the corresponding `ACTION` operation when the query execution time exceeds the configured `EXEC_ELAPSED`, `SWITCH_GROUP` might not be able to switch the query to the target resource group in a timely manner in such scenarios. + #### Examples 1. Create a resource group `rg1` with a quota of 500 RUs per second, and define a runaway query as one that exceeds 60 seconds, and lower the priority of the runaway query. - ```sql CREATE RESOURCE GROUP IF NOT EXISTS rg1 RU_PER_SEC = 500 QUERY_LIMIT=(EXEC_ELAPSED='60s', ACTION=COOLDOWN); ``` - 2. Change the `rg1` resource group to terminate the runaway queries, and mark the queries with the same pattern as runaway queries immediately in the next 10 minutes. - ```sql ALTER RESOURCE GROUP rg1 QUERY_LIMIT=(EXEC_ELAPSED='60s', ACTION=KILL, WATCH=SIMILAR DURATION='10m'); ``` @@ -344,7 +346,13 @@ The parameters are as follows: QUERY WATCH ADD RESOURCE GROUP rg1 SQL TEXT SIMILAR TO 'select * from test.t2'; ``` -- Add a matching feature to the runaway query watch list for the `rg1` resource group using `PLAN DIGEST`. +- Add a matching feature to the runaway query watch list for the `rg1` resource group by parsing the SQL into SQL Digest, and specify `ACTION` as `SWITCH_GROUP(rg2)`. + + ```sql + QUERY WATCH ADD RESOURCE GROUP rg1 ACTION SWITCH_GROUP(rg2) SQL TEXT SIMILAR TO 'select * from test.t2'; + ``` + +- Add a matching feature to the runaway query watch list for the `rg1` resource group using `PLAN DIGEST`, and specify `ACTION` as `KILL`. ```sql QUERY WATCH ADD RESOURCE GROUP rg1 ACTION KILL PLAN DIGEST 'd08bc323a934c39dc41948b0a073725be3398479b6fa4f6dd1db2a9b115f7f57'; From d6adac9c3ecc1aeca1bc8bd4bcf68c648f9aa488 Mon Sep 17 00:00:00 2001 From: JmPotato Date: Mon, 23 Sep 2024 11:07:01 +0800 Subject: [PATCH 2/4] Address the comments Signed-off-by: JmPotato --- tidb-resource-control.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tidb-resource-control.md b/tidb-resource-control.md index 2f681173d0e65..26676d1224c89 100644 --- a/tidb-resource-control.md +++ b/tidb-resource-control.md @@ -278,7 +278,7 @@ Supported operations (`ACTION`): - `DRYRUN`: no action is taken. The records are appended for the runaway queries. This is mainly used to observe whether the condition setting is reasonable. - `COOLDOWN`: the execution priority of the query is lowered to the lowest level. The query continues to execute with the lowest priority and does not occupy resources of other operations. - `KILL`: the identified query is automatically terminated and reports an error `Query execution was interrupted, identified as runaway query`. -- `SWITCH_GROUP`: switches the identified query to the specified resource group for continued execution. This option is introduced in v8.4.0. +- `SWITCH_GROUP`: switches the identified query to the specified resource group for continued execution. After the query completes, the consequential SQL statements are executed in the original resource group. If the target resource group does not exist, the query stays in the original resource group. This option is introduced in v8.4.0. To avoid too many concurrent runaway queries that exhaust system resources, the resource control feature introduces a quick identification mechanism, which can quickly identify and isolate runaway queries. You can use this feature through the `WATCH` clause. When a query is identified as a runaway query, this mechanism extracts the matching feature (defined by the parameter after `WATCH`) of the query. In the next period of time (defined by `DURATION`), the matching feature of the runaway query is added to the watch list, and the TiDB instance matches queries with the watch list. The matching queries are directly marked as runaway queries and isolated according to the corresponding action, instead of waiting for them to be identified by conditions. The `KILL` operation terminates the query and reports an error `Quarantined and interrupted because of being in runaway watch list`. @@ -302,15 +302,18 @@ The parameters of `QUERY_LIMIT` are as follows: > **Note:** > -> It is recommended to use the `SWITCH_GROUP` statement together with the [`QUERY WATCH`](/tidb-resource-control.md#query-watch-parameters) statement. Because `QUERY_LIMIT` only triggers the corresponding `ACTION` operation when the query execution time exceeds the configured `EXEC_ELAPSED`, `SWITCH_GROUP` might not be able to switch the query to the target resource group in a timely manner in such scenarios. +> If you want to quarantine the runaway queries strictly in one resource group, it is recommended to set directive `SWITCH_GROUP` together with the [`QUERY WATCH`](/tidb-resource-control.md#query-watch-parameters) statement. Because `QUERY_LIMIT` only triggers the corresponding `ACTION` operation when the query meets the criteria, `SWITCH_GROUP` might not be able to switch the query to the target resource group in a timely manner in such scenarios. #### Examples 1. Create a resource group `rg1` with a quota of 500 RUs per second, and define a runaway query as one that exceeds 60 seconds, and lower the priority of the runaway query. + ```sql CREATE RESOURCE GROUP IF NOT EXISTS rg1 RU_PER_SEC = 500 QUERY_LIMIT=(EXEC_ELAPSED='60s', ACTION=COOLDOWN); ``` + 2. Change the `rg1` resource group to terminate the runaway queries, and mark the queries with the same pattern as runaway queries immediately in the next 10 minutes. + ```sql ALTER RESOURCE GROUP rg1 QUERY_LIMIT=(EXEC_ELAPSED='60s', ACTION=KILL, WATCH=SIMILAR DURATION='10m'); ``` From ca3aefc5158b0472f20bbea13fbb84f3f05d4df0 Mon Sep 17 00:00:00 2001 From: lilin90 Date: Wed, 9 Oct 2024 17:22:22 +0800 Subject: [PATCH 3/4] Update format --- sql-statements/sql-statement-query-watch.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/sql-statements/sql-statement-query-watch.md b/sql-statements/sql-statement-query-watch.md index b3c67ef28074d..e2aae925084ff 100644 --- a/sql-statements/sql-statement-query-watch.md +++ b/sql-statements/sql-statement-query-watch.md @@ -16,23 +16,28 @@ The `QUERY WATCH` statement is used to manually manage the watch list of runaway ```ebnf+diagram AddQueryWatchStmt ::= "QUERY" "WATCH" "ADD" QueryWatchOptionList + QueryWatchOptionList ::= QueryWatchOption | QueryWatchOptionList QueryWatchOption | QueryWatchOptionList ',' QueryWatchOption + QueryWatchOption ::= "RESOURCE" "GROUP" ResourceGroupName | "RESOURCE" "GROUP" UserVariable | "ACTION" EqOpt ResourceGroupRunawayActionOption | QueryWatchTextOption + ResourceGroupName ::= Identifier | "DEFAULT" + ResourceGroupRunawayActionOption ::= DRYRUN | COOLDOWN | KILL | "SWITCH_GROUP" '(' ResourceGroupName ')' + QueryWatchTextOption ::= "SQL" "DIGEST" SimpleExpr | "PLAN" "DIGEST" SimpleExpr From cfe0241d1a216f4c43af9c18709adeaba168a0e1 Mon Sep 17 00:00:00 2001 From: Lilian Lee Date: Wed, 9 Oct 2024 17:57:39 +0800 Subject: [PATCH 4/4] Update wording --- tidb-resource-control.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb-resource-control.md b/tidb-resource-control.md index 26676d1224c89..600fddad43c60 100644 --- a/tidb-resource-control.md +++ b/tidb-resource-control.md @@ -278,7 +278,7 @@ Supported operations (`ACTION`): - `DRYRUN`: no action is taken. The records are appended for the runaway queries. This is mainly used to observe whether the condition setting is reasonable. - `COOLDOWN`: the execution priority of the query is lowered to the lowest level. The query continues to execute with the lowest priority and does not occupy resources of other operations. - `KILL`: the identified query is automatically terminated and reports an error `Query execution was interrupted, identified as runaway query`. -- `SWITCH_GROUP`: switches the identified query to the specified resource group for continued execution. After the query completes, the consequential SQL statements are executed in the original resource group. If the target resource group does not exist, the query stays in the original resource group. This option is introduced in v8.4.0. +- `SWITCH_GROUP`: introduced in v8.4.0, this parameter switches the identified query to the specified resource group for continued execution. After this query completes, subsequent SQL statements are executed in the original resource group. If the specified resource group does not exist, the query remains in the original resource group. To avoid too many concurrent runaway queries that exhaust system resources, the resource control feature introduces a quick identification mechanism, which can quickly identify and isolate runaway queries. You can use this feature through the `WATCH` clause. When a query is identified as a runaway query, this mechanism extracts the matching feature (defined by the parameter after `WATCH`) of the query. In the next period of time (defined by `DURATION`), the matching feature of the runaway query is added to the watch list, and the TiDB instance matches queries with the watch list. The matching queries are directly marked as runaway queries and isolated according to the corresponding action, instead of waiting for them to be identified by conditions. The `KILL` operation terminates the query and reports an error `Quarantined and interrupted because of being in runaway watch list`. @@ -302,7 +302,7 @@ The parameters of `QUERY_LIMIT` are as follows: > **Note:** > -> If you want to quarantine the runaway queries strictly in one resource group, it is recommended to set directive `SWITCH_GROUP` together with the [`QUERY WATCH`](/tidb-resource-control.md#query-watch-parameters) statement. Because `QUERY_LIMIT` only triggers the corresponding `ACTION` operation when the query meets the criteria, `SWITCH_GROUP` might not be able to switch the query to the target resource group in a timely manner in such scenarios. +> If you want to strictly limit runaway queries to a specific resource group, it is recommended to use `SWITCH_GROUP` together with the [`QUERY WATCH`](/tidb-resource-control.md#query-watch-parameters) statement. Because `QUERY_LIMIT` only triggers the corresponding `ACTION` operation when the query meets the criteria, `SWITCH_GROUP` might not be able to switch the query to the target resource group in a timely manner in such scenarios. #### Examples