Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: skip evict-leader-scheduler when setting schedule deny label #8303

Merged
merged 31 commits into from
Jun 24, 2024

Conversation

okJiang
Copy link
Member

@okJiang okJiang commented Jun 17, 2024

What problem does this PR solve?

Issue Number: Close #7853 ref #7300

What is changed and how does it work?

- add a real cluster test to test `skip evict-leader-scheduler when setting schedule deny label`
- add `DeleteStoreLabel` API and `DeleteScheduler` API

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

okJiang added 2 commits June 17, 2024 17:43
Signed-off-by: okJiang <[email protected]>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 17, 2024
@okJiang okJiang changed the title scheduler: skil evict-leader-scheduler when setting schedule deny lable scheduler: skil evict-leader-scheduler when setting schedule deny label Jun 17, 2024
Copy link

codecov bot commented Jun 17, 2024

Codecov Report

Attention: Patch coverage is 72.22222% with 5 lines in your changes missing coverage. Please review.

Project coverage is 77.34%. Comparing base (debb5fe) to head (fa47700).

Current head fa47700 differs from pull request most recent head f981b69

Please upload reports for the commit f981b69 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8303      +/-   ##
==========================================
+ Coverage   77.30%   77.34%   +0.03%     
==========================================
  Files         470      470              
  Lines       61500    61377     -123     
==========================================
- Hits        47545    47469      -76     
+ Misses      10393    10354      -39     
+ Partials     3562     3554       -8     
Flag Coverage Δ
unittests 77.34% <72.22%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@okJiang okJiang changed the title scheduler: skil evict-leader-scheduler when setting schedule deny label scheduler: skip evict-leader-scheduler when setting schedule deny label Jun 17, 2024
Signed-off-by: okJiang <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 18, 2024
@HuSharp
Copy link
Member

HuSharp commented Jun 18, 2024

/test ?

Copy link
Contributor

ti-chi-bot bot commented Jun 18, 2024

@HuSharp: The following commands are available to trigger required jobs:

  • /test build

The following commands are available to trigger optional jobs:

  • /test pull-integration-copr-test
  • /test pull-integration-realcluster-test

Use /test all to run the following jobs that were automatically triggered:

  • tikv/pd/ghpr_build

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@HuSharp
Copy link
Member

HuSharp commented Jun 18, 2024

/test pull-integration-realcluster-test

Signed-off-by: okJiang <[email protected]>
@okJiang
Copy link
Member Author

okJiang commented Jun 18, 2024

/test pull-integration-realcluster-test

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 18, 2024
Signed-off-by: okJiang <[email protected]>
okJiang added 2 commits June 18, 2024 17:19
Signed-off-by: okJiang <[email protected]>
Signed-off-by: okJiang <[email protected]>
okJiang added 5 commits June 18, 2024 17:25
Signed-off-by: okJiang <[email protected]>
Signed-off-by: okJiang <[email protected]>
Signed-off-by: okJiang <[email protected]>
Signed-off-by: okJiang <[email protected]>
Signed-off-by: okJiang <[email protected]>
"context"
"os/exec"
"testing"
// func restartTiUP() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the build flag enable_flaky_tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure how enable_flaky_tests works. I reverted the changes to this file first because the test is just not working, not because it's unstable. ad19424

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a flaky test, can be fixed by #8303 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it. fa47700

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got

Signed-off-by: okJiang <[email protected]>
@HuSharp
Copy link
Member

HuSharp commented Jun 24, 2024

ci failed, Can you test this PR with lightning/br and paste a picture to indicate the change?

@okJiang
Copy link
Member Author

okJiang commented Jun 24, 2024

ci failed, Can you test this PR with lightning/br and paste a picture to indicate the change?

I think realcluster test is enough for this pr😂 It's not really worth spending a lot of time making specific changes for this requirement to do e2e testing. It's completely sufficient to know that evict leader still works when calling the API.

Signed-off-by: okJiang <[email protected]>
Copy link
Member

@HuSharp HuSharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, rest LGTM!

@@ -4,14 +4,13 @@
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
// http://www.apache.org/licenses/LICENSE-2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding comments and uncommenting it, the IDE automatically formatted it. If it will have any impact, I can revert it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024-06-24.16.22.42.mov

Please let me keep it😭

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to revert it

return true
}, testutil.WithWaitFor(time.Minute))

re.NoError(pdHTTPCli.DeleteScheduler(ctx, schedulers.EvictLeaderName))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean L143~L152

If the test exits in L143~L152 which means this test is failed, maybe all tests will be regarded as failed and then no need to delete the scheduler by defer function?

You can check the log in Jenkins, maybe it has some redundant delete?
https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/98/pipeline/#step-77-log-200

> $CUR_PATH/playground.log 2>&1 &
else
# CI will download the binaries in the prepare phase.
# ref https://github.com/PingCAP-QE/ci/blob/387e9e533b365174962ccb1959442a7070f9cd66/pipelines/tikv/pd/latest/pull_integration_realcluster_test.groovy#L55-L68
color-green "using existing binaries..."
make pd-server WITH_RACE=1
Copy link
Member

@HuSharp HuSharp Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this line, as the comment above explains that there is no need to make pd :) may reduce test time

Signed-off-by: okJiang <[email protected]>
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jun 24, 2024
Comment on lines 1210 to 1212
if len(newStore.GetLabels()) == 0 {
return errors.Errorf("the label key %s does not exist", labelKey)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check it before cloning?

Copy link
Member Author

@okJiang okJiang Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clone store before L1223(newStore.Labels = labels)?

@@ -4,14 +4,13 @@
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
// http://www.apache.org/licenses/LICENSE-2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to revert it

Signed-off-by: okJiang <[email protected]>
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 24, 2024
Copy link
Contributor

ti-chi-bot bot commented Jun 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HuSharp, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented Jun 24, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-06-24 08:26:10.346559716 +0000 UTC m=+621696.832048546: ☑️ agreed by HuSharp.
  • 2024-06-24 08:44:43.289495517 +0000 UTC m=+622809.774984347: ☑️ agreed by rleungx.

@ti-chi-bot ti-chi-bot bot merged commit 26e90e9 into tikv:master Jun 24, 2024
17 checks passed
@okJiang okJiang deleted the fix-scheduler-deny branch July 30, 2024 02:05
@River2000i
Copy link

@rleungx @lance6716 Can you help me cp to 7.5 for testing? We can hold off on merging until we have a conclusion on whether to backport to v7.5.x and then merge.

@rleungx rleungx added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Sep 9, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #8602.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Sep 9, 2024
rleungx pushed a commit to rleungx/pd that referenced this pull request Sep 10, 2024
…el (tikv#8303)

ref tikv#7300, close tikv#7853

- add a real cluster test to test `skip evict-leader-scheduler when setting schedule deny label`
- add `DeleteStoreLabel` API and `DeleteScheduler` API

Signed-off-by: okJiang <[email protected]>
rleungx pushed a commit to rleungx/pd that referenced this pull request Sep 10, 2024
…el (tikv#8303)

ref tikv#7300, close tikv#7853

- add a real cluster test to test `skip evict-leader-scheduler when setting schedule deny label`
- add `DeleteStoreLabel` API and `DeleteScheduler` API

Signed-off-by: okJiang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow evicting leader scheduling when lightning importing
5 participants