Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After performing an online recovery, "halt-scheduling" has been set to true when reloading pd #8095

Closed
mayjiang0203 opened this issue Apr 18, 2024 · 5 comments · Fixed by #8147
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.

Comments

@mayjiang0203
Copy link

Bug Report

What did you do?

What did you expect to see?

Should be set to false.

What did you see instead?

[2024/04/18 16:16:08.515 +08:00] [INFO] [cluster.go:1093] ["will run cmd"] [cmd:="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 unsafe remove-failed-stores show"]
  {
    "info": "Unsafe recovery Finished",
    "time": "2024-04-18 16:15:42.491",
[2024/04/18 16:16:22.872 +08:00] [INFO] [cmd.go:197] ["Remote command finished"] [cmd="tiup cluster reload tidbcluster -R pd -y"] [exitcode=0] []
[2024/04/18 16:16:24.293 +08:00] [INFO] [pdutil.go:512] ["run pd ctl command"] [pdCmd="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 config show all"]

What version of PD are you using (pd-server -V)?

v8.1.0

[2024/04/18 15:15:22.453 +08:00] [INFO] [workloadnode.run] [util.go:255] ["/tiup/deploy/pd-/bin/pd-server -V"] [workload=pd2]
[2024/04/18 15:15:22.455 +08:00] [INFO] [cmd.go:150] ["Start remote command"] [cmd="/tiup/deploy/pd-
/bin/pd-server -V"] [nodename=pd2]
2024-04-18T15:15:22.455+0800 INFO k8s/client.go:223 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
Release Version: v8.1.0^M
Edition: Community^M
Git Commit Hash: 3ec92bd^M
Git Branch: HEAD^M
UTC Build Time: 2024-04-15 03:59:49^M

@mayjiang0203 mayjiang0203 added the type/bug The issue is confirmed as a bug. label Apr 18, 2024
@mayjiang0203
Copy link
Author

mayjiang0203 commented Apr 18, 2024

/severity major
/label affects-8.1
/label affects-7.1
/label affects-7.5
/remove-label may-affects-7.5
/remove-label may-affects-7.1
/remove-label may-affects-6.5
/remove-label may-affects-6.1
/remove-label may-affects-5.4

Copy link
Contributor

ti-chi-bot bot commented Apr 19, 2024

@mayjiang0203: These labels are not set on the issue: affects-7.5, affects-7.1, affects-6.5, affects-6.1, affects-5.4.

In response to this:

/severity major
/label affects-8.1
/remove-label affects-7.5
/remove-label affects-7.1
/remove-label affects-6.5
/remove-label affects-6.1
/remove-label affects-5.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Apr 23, 2024

@mayjiang0203: These labels are not set on the issue: may-affects-7.5, may-affects-7.1, may-affects-6.5, may-affects-6.1, may-affects-5.4.

In response to this:

/severity major
/label affects-8.1
/label affects-7.1
/label affects-7.5
/remove-label may-affects-7.5
/remove-label may-affects-7.1
/remove-label may-affects-6.5
/remove-label may-affects-6.1
/remove-label may-affects-5.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot added the affects-7.5 This bug affects the 7.5.x(LTS) versions. label Apr 23, 2024
@mayjiang0203
Copy link
Author

The impact of this bug: Reloading the cluster will become very slow because evicting the leader is not working anymore, and restarting TiKV requires waiting for a 10-minute timeout.
w/a is: reload pd first, then do "config set halt-scheduling false", after that can reload the cluster.

@ti-chi-bot ti-chi-bot bot closed this as completed in #8147 May 8, 2024
ti-chi-bot bot added a commit that referenced this issue May 8, 2024
…8147)

ref #6493, close #8095

Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot pushed a commit that referenced this issue May 9, 2024
…8147) (#8155)

ref #6493, close #8095

Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: JmPotato <[email protected]>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue May 20, 2024
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue May 20, 2024
ti-chi-bot bot pushed a commit that referenced this issue May 22, 2024
…8147) (#8194)

ref #6493, close #8095

Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: JmPotato <[email protected]>
Co-authored-by: lhy1024 <[email protected]>
@seiya-annie
Copy link

/found customer

@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jun 11, 2024
@rleungx rleungx removed the affects-7.1 This bug affects the 7.1.x(LTS) versions. label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.
Projects
Development

Successfully merging a pull request may close this issue.

3 participants