mcs: reorganize cluster start and stop process #7155

rleungx · 2023-09-26T07:45:03Z

What problem does this PR solve?

Issue Number: Close #7140, close #7106

What is changed and how does it work?

This PR reorganizes the cluster start/stop process and fix the race.

Check List

Tests

Unit test

Release note

None.

ti-chi-bot · 2023-09-26T07:45:05Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

JmPotato
lhy1024

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

codecov · 2023-09-26T07:59:58Z

Codecov Report

Merging #7155 (7ff12a2) into master (849d80d) will increase coverage by 0.02%.
Report is 5 commits behind head on master.
The diff coverage is 60.65%.

@@            Coverage Diff             @@
##           master    #7155      +/-   ##
==========================================
+ Coverage   74.58%   74.61%   +0.02%     
==========================================
  Files         441      441              
  Lines       47292    47388      +96     
==========================================
+ Hits        35275    35358      +83     
+ Misses       8940     8934       -6     
- Partials     3077     3096      +19

Flag	Coverage Δ
unittests	`74.61% <60.65%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

lhy1024 · 2023-09-26T08:12:04Z

pkg/mcs/scheduling/server/cluster.go

+				continue
+			}
+		}
+
 		log.Info("schedulers updating notifier is triggered, try to update the scheduler")


If stop server here, is there data race?

I think it is the same as the current PD.

In other word,is it possible to meet data race when add scheduler and coordinator wait at the same time?

I think so but the possibility is much smaller than before.

Another way: we can check the cluster status before adding a scheduler every time.

But there is still a gap between check status and adding scheduluer, if stop server here after checking the cluster status and before adding scheduler, it is possible to meet data race too.

The problem is the way we use the wait group for the scheduler controller is not proper instead of the wait group itself.

Signed-off-by: Ryan Leung <[email protected]>

rleungx · 2023-10-08T04:02:01Z

@JmPotato PTAL

JmPotato · 2023-10-08T07:02:14Z

pkg/mcs/scheduling/server/cluster.go

+				return
+			case <-ticker.C:
+				// retry
+				notifier <- struct{}{}


Is it possible we have a deadlock here? Since the length of the channel is only 1 and if the scheduler config watcher just sent it before, it could be blocked here.

Signed-off-by: Ryan Leung <[email protected]>

JmPotato

The rest LGTM.

JmPotato · 2023-10-09T01:58:10Z

pkg/mcs/scheduling/server/cluster.go

+				select {
+				case notifier <- struct{}{}:
+				// If the channel is not empty, it means the check is triggered.
+				default:
+				}


What about warping a trySend function to reuse the code?

sounds good

Signed-off-by: Ryan Leung <[email protected]>

rleungx · 2023-10-09T05:46:36Z

/merge

ti-chi-bot · 2023-10-09T05:46:38Z

@rleungx: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2023-10-09T05:46:39Z

This pull request has been accepted and is ready to merge.

Commit hash: 891a322

close tikv#7106, close tikv#7140 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Sep 26, 2023

ti-chi-bot bot requested review from JmPotato and lhy1024 September 26, 2023 07:45

ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 26, 2023

lhy1024 reviewed Sep 26, 2023

View reviewed changes

lhy1024 approved these changes Sep 26, 2023

View reviewed changes

ti-chi-bot bot added status/LGT1 Indicates that a PR has LGTM 1. and removed do-not-merge/needs-linked-issue labels Sep 26, 2023

rleungx mentioned this pull request Sep 27, 2023

Separate scheduling into an independent service #5839

Closed

rleungx added 2 commits October 8, 2023 12:01

reorganize cluster start and stop process

12b3da4

Signed-off-by: Ryan Leung <[email protected]>

add lock

40887ae

Signed-off-by: Ryan Leung <[email protected]>

rleungx force-pushed the reorg-cluster branch from 3a52129 to 40887ae Compare October 8, 2023 04:01

JmPotato reviewed Oct 8, 2023

View reviewed changes

address the comment

035489c

Signed-off-by: Ryan Leung <[email protected]>

JmPotato reviewed Oct 9, 2023

View reviewed changes

address the comment

891a322

Signed-off-by: Ryan Leung <[email protected]>

rleungx requested a review from JmPotato October 9, 2023 02:27

JmPotato approved these changes Oct 9, 2023

View reviewed changes

ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 9, 2023

ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Oct 9, 2023

Merge branch 'master' into reorg-cluster

7ff12a2

ti-chi-bot bot merged commit 2556b5b into tikv:master Oct 9, 2023
21 of 23 checks passed

rleungx deleted the reorg-cluster branch October 9, 2023 06:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mcs: reorganize cluster start and stop process #7155

mcs: reorganize cluster start and stop process #7155

rleungx commented Sep 26, 2023 •

edited

Loading

ti-chi-bot bot commented Sep 26, 2023 •

edited

Loading

codecov bot commented Sep 26, 2023 •

edited

Loading

lhy1024 Sep 26, 2023

rleungx Sep 26, 2023

lhy1024 Sep 26, 2023

rleungx Sep 26, 2023

rleungx Sep 26, 2023 •

edited

Loading

lhy1024 Sep 26, 2023

rleungx Sep 26, 2023

rleungx commented Oct 8, 2023

JmPotato Oct 8, 2023

JmPotato left a comment

JmPotato Oct 9, 2023

rleungx Oct 9, 2023

rleungx commented Oct 9, 2023

ti-chi-bot bot commented Oct 9, 2023

ti-chi-bot bot commented Oct 9, 2023

mcs: reorganize cluster start and stop process #7155

mcs: reorganize cluster start and stop process #7155

Conversation

rleungx commented Sep 26, 2023 • edited Loading

What problem does this PR solve?

What is changed and how does it work?

Check List

Release note

ti-chi-bot bot commented Sep 26, 2023 • edited Loading

codecov bot commented Sep 26, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rleungx Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rleungx commented Oct 8, 2023

Choose a reason for hiding this comment

JmPotato left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rleungx commented Oct 9, 2023

ti-chi-bot bot commented Oct 9, 2023

ti-chi-bot bot commented Oct 9, 2023

rleungx commented Sep 26, 2023 •

edited

Loading

ti-chi-bot bot commented Sep 26, 2023 •

edited

Loading

codecov bot commented Sep 26, 2023 •

edited

Loading

rleungx Sep 26, 2023 •

edited

Loading