Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduling service recover more than 5mins when inject scheduling primary network partition #7854

Open
Lily2025 opened this issue Feb 28, 2024 · 2 comments
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@Lily2025
Copy link

Lily2025 commented Feb 28, 2024

Enhancement

What did you do?

1、run workload
2、inject network partition between scheduling primary and all other pods
image

What did you expect to see?

scheduling service can recover less than 5mins when inject scheduling primary network partition

What did you see instead?

scheduling service recover more than 5mins when inject scheduling primary network partition
image

What version of PD are you using (pd-server -V)?

./pd-server -V
Release Version: v8.0.0-alpha
Edition: Community
Git Commit Hash: e199866
Git Branch: heads/refs/tags/v8.0.0-alpha
UTC Build Time: 2024-02-26 11:38:17
2024-02-28T11:55:27.776+0800

@Lily2025 Lily2025 added the type/bug The issue is confirmed as a bug. label Feb 28, 2024
@Lily2025
Copy link
Author

/assign rleungx

@rleungx
Copy link
Member

rleungx commented Feb 28, 2024

It relies on hibernate region tick interval because currently, the switch of scheduling primary won't awake all regions. So the prepare checker cannot receive all regions' heartbeat in time.

@rleungx rleungx added type/enhancement The issue or PR belongs to an enhancement. and removed type/bug The issue is confirmed as a bug. labels Feb 28, 2024
@rleungx rleungx assigned lhy1024 and unassigned rleungx Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
Status: Need Triage
Development

No branches or pull requests

3 participants