Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdms: Choose a suitable pdms to transfer primary when upgrade #5643

Merged
merged 1 commit into from
Aug 14, 2024

Conversation

HuSharp
Copy link
Contributor

@HuSharp HuSharp commented May 10, 2024

What problem does this PR solve?

Ref #1235, Ref tikv/pd#8157

What is changed and how does it work?

summary

Let's assume there are three tso nodes scheduling-0, scheduling-1, scheduling-2.
tidb-operator will upgrade them in the order 2->0.
If scheduling-1 is primary, it is possible that when upgrading scheduling-1, the primary will be transferred to scheduling-0, and then the primary will be transferred again when upgrading scheduling-0.

  • This pr ensures that when scheduling-1 is upgraded, the primary is transferred to scheduling-2, reducing the number of transfers.

Using API

When I created 3 scheduling pods with 8.3.0 PD version

$ kubectl exec -it basic-pd-0 -n pingcap sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
sh-5.1# curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/members/scheduling'
[
    {
        "name": "basic-scheduling-0",
        "service-addr": "http://basic-scheduling-0.basic-scheduling-peer.pingcap.svc:2379",
        "version": "v8.3.0",
        "git-hash": "2d9a3b0e5da1a8e50251c4510368e5b3085394c7",
        "deploy-path": "/",
        "start-timestamp": 1723535895
    },
    {
        "name": "basic-scheduling-1",
        "service-addr": "http://basic-scheduling-1.basic-scheduling-peer.pingcap.svc:2379",
        "version": "v8.3.0",
        "git-hash": "2d9a3b0e5da1a8e50251c4510368e5b3085394c7",
        "deploy-path": "/",
        "start-timestamp": 1723535883
    },
    {
        "name": "basic-scheduling-2",
        "service-addr": "http://basic-scheduling-2.basic-scheduling-peer.pingcap.svc:2379",
        "version": "v8.3.0",
        "git-hash": "2d9a3b0e5da1a8e50251c4510368e5b3085394c7",
        "deploy-path": "/",
        "start-timestamp": 1723535831
    }
]

// get current leader which is `scheduling-1`
sh-5.1# curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/primary/scheduling'
"http://basic-scheduling-1.basic-scheduling-peer.pingcap.svc:2379"

// we need to login `scheduling-1` machine
// and then transfer primary to `scheduling-2`
$ kubectl exec -it basic-scheduling-1 -n pingcap sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
sh-5.1# curl --location --request POST 'http://127.0.0.1:2379/scheduling/api/v1/primary/transfer' \
--header 'Content-Type: application/json' \
--data-raw '{
    "new_primary": "basic-scheduling-2"
}'
"success"

// get current leader which is `scheduling-2`
$ kubectl exec -it basic-pd-0 -n pingcap sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
sh-5.1# curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/primary/scheduling'
"http://basic-scheduling-2.basic-scheduling-peer.pingcap.svc:2379"

check log

Let's upgrade 3 scheduling, and primary is scheduling-2 now.

// when `scheduling-2` is primary, should transfer to `scheduling-0`
I0813 07:57:08.289779       1 pd_ms_upgrader.go:144] TidbCluster: [pingcap/basic]' pdms upgrader: check primary: http://basic-scheduling-2.basic-scheduling-peer.pingcap.svc:2379, upgradePDMSName: basic-scheduling-2, upgradePodName: basic-scheduling-2
I0813 07:57:08.289801       1 pd_ms_upgrader.go:180] Tidbcluster: [pingcap/basic]' pdms upgrader: start to choose pdms to transfer primary from members
I0813 07:57:08.289815       1 pd_ms_upgrader.go:205] Tidbcluster: [pingcap/basic]' pdms upgrader: choose pdms to transfer primary from members, targetName: basic-scheduling-0
I0813 07:57:08.289820       1 pd_ms_upgrader.go:155] TidbCluster: [pingcap/basic]' pdms upgrader: transfer pdms primary to: basic-scheduling-0
E0813 07:57:08.289834       1 pdms_api.go:67] only support TSO service, but got scheduling
I0813 07:57:08.296517       1 pd_ms_upgrader.go:161] TidbCluster: [pingcap/basic]' pdms upgrader: transfer pdms primary to: basic-scheduling-0 successfully


// `scheduling-1` will upgraded directly which the primary is `scheduling-0`
I0813 07:57:57.924827       1 pd_ms_upgrader.go:144] TidbCluster: [pingcap/basic]' pdms upgrader: check primary: http://basic-scheduling-0.basic-scheduling-peer.pingcap.svc:2379, upgradePDMSName: basic-scheduling-1, upgradePodName: basic-scheduling-1

// when upgrade `scheduling-0`, should transfer to `scheduling-2` because `scheduling-0` is primary now.
I0813 07:58:05.912621       1 statefulset.go:182] set pingcap/basic-scheduling partition to 1
I0813 07:58:05.912978       1 pd_ms_upgrader.go:144] TidbCluster: [pingcap/basic]' pdms upgrader: check primary: http://basic-scheduling-0.basic-scheduling-peer.pingcap.svc:2379, upgradePDMSName: basic-scheduling-0, upgradePodName: basic-scheduling-0
I0813 07:58:05.912998       1 pd_ms_upgrader.go:180] Tidbcluster: [pingcap/basic]' pdms upgrader: start to choose pdms to transfer primary from members
I0813 07:58:05.913011       1 pd_ms_upgrader.go:205] Tidbcluster: [pingcap/basic]' pdms upgrader: choose pdms to transfer primary from members, targetName: basic-scheduling-2
I0813 07:58:05.913022       1 pd_ms_upgrader.go:155] TidbCluster: [pingcap/basic]' pdms upgrader: transfer pdms primary to: basic-scheduling-2
E0813 07:58:05.913040       1 pdms_api.go:67] only support TSO service, but got scheduling
I0813 07:58:05.919682       1 pd_ms_upgrader.go:161] TidbCluster: [pingcap/basic]' pdms upgrader: transfer pdms primary to: basic-scheduling-2 successfully

Code changes

  • Has Go code change

Tests

  • Unit test
  • Manual test
  • No code

Side effects

  • Breaking backward compatibility
  • Other side effects:

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release Notes

Please refer to Release Notes Language Style Guide before writing the release note.


@HuSharp HuSharp force-pushed the support_transfer_primary branch from 9707e2d to 33292f0 Compare July 12, 2024 01:20
@HuSharp HuSharp force-pushed the support_transfer_primary branch from 33292f0 to e91e291 Compare July 22, 2024 08:48
@ti-chi-bot ti-chi-bot bot added size/XL and removed size/L labels Jul 22, 2024
@HuSharp HuSharp force-pushed the support_transfer_primary branch 3 times, most recently from 1222607 to 894d1aa Compare July 23, 2024 03:56
@HuSharp HuSharp marked this pull request as ready for review July 24, 2024 07:18
@HuSharp HuSharp changed the title [DNM] pdms: Choose a suitable pdms to transfer primary when upgrade pdms: Choose a suitable pdms to transfer primary when upgrade Jul 24, 2024
@HuSharp HuSharp force-pushed the support_transfer_primary branch from 7d9575a to 81c8880 Compare July 25, 2024 03:10
@HuSharp
Copy link
Contributor Author

HuSharp commented Jul 25, 2024

@csuzhangxc PTAL, thx~

@HuSharp HuSharp requested review from csuzhangxc and KanShiori and removed request for csuzhangxc July 25, 2024 03:11
@HuSharp
Copy link
Contributor Author

HuSharp commented Jul 25, 2024

/run-all-tests

@codecov-commenter
Copy link

codecov-commenter commented Jul 25, 2024

Codecov Report

Attention: Patch coverage is 3.50877% with 110 lines in your changes missing coverage. Please review.

Project coverage is 33.08%. Comparing base (9ef26f8) to head (d02d9b8).
Report is 13 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (9ef26f8) and HEAD (d02d9b8). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (9ef26f8) HEAD (d02d9b8)
unittest 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5643       +/-   ##
===========================================
- Coverage   61.47%   33.08%   -28.39%     
===========================================
  Files         235      219       -16     
  Lines       30653    30611       -42     
===========================================
- Hits        18843    10127     -8716     
- Misses       9920    19087     +9167     
+ Partials     1890     1397      -493     
Flag Coverage Δ
e2e 33.08% <3.50%> (?)
unittest ?

Signed-off-by: husharp <[email protected]>
@HuSharp
Copy link
Contributor Author

HuSharp commented Aug 13, 2024

/run-all-tests

1 similar comment
@HuSharp
Copy link
Contributor Author

HuSharp commented Aug 13, 2024

/run-all-tests

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-across-kubernetes

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-tikv-scale-simultaneously

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-tngm

Copy link
Contributor

ti-chi-bot bot commented Aug 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csuzhangxc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the lgtm label Aug 13, 2024
Copy link
Contributor

ti-chi-bot bot commented Aug 13, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-08-13 09:26:04.210377006 +0000 UTC m=+260048.913846641: ☑️ agreed by csuzhangxc.

@ti-chi-bot ti-chi-bot bot added the approved label Aug 13, 2024
@HuSharp
Copy link
Contributor Author

HuSharp commented Aug 13, 2024

/run-pull-e2e-kind-serial

@HuSharp
Copy link
Contributor Author

HuSharp commented Aug 13, 2024

/run-pull-e2e-kind

1 similar comment
@csuzhangxc
Copy link
Member

/run-pull-e2e-kind

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-across-kubernetes

1 similar comment
@HuSharp
Copy link
Contributor Author

HuSharp commented Aug 14, 2024

/run-pull-e2e-kind-across-kubernetes

@HuSharp
Copy link
Contributor Author

HuSharp commented Aug 14, 2024

/run-pull-e2e-kind

@ti-chi-bot ti-chi-bot bot merged commit dbe75c0 into pingcap:master Aug 14, 2024
13 checks passed
@HuSharp HuSharp deleted the support_transfer_primary branch August 14, 2024 01:47
@csuzhangxc
Copy link
Member

/cherry-pick release-1.6

@ti-chi-bot
Copy link
Member

@csuzhangxc: new pull request created to branch release-1.6: #5709.

In response to this:

/cherry-pick release-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

csuzhangxc pushed a commit that referenced this pull request Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants