Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiKV store not balanced after lightning physical backend in parallel mode #7093

Closed
fubinzh opened this issue Sep 14, 2023 · 12 comments
Closed
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. may-affects-5.3 severity/major type/bug The issue is confirmed as a bug.

Comments

@fubinzh
Copy link

fubinzh commented Sep 14, 2023

Bug Report

What did you do?

  1. Use 2 lightning to parallel import data to TiDB via physical backend, 1 lightning import 1TB, the other 3TB.
  2. Use 2 lightning to parallel import data to TiDB via physical backend, 1 lightning import 2.4TB, the other 3TB.

What did you expect to see?

TiKV store should be labanced

What did you see instead?

TiKV Store not balanced. 1.9TB v.s. 850GB.

006403a8-8f77-4870-8fbd-ee1b7987e7e3
5d8b2f1f-c2d0-4e5a-83df-45b04cb0c34a

Lightning log indicates there are scattering timeout:

/tmp/source-kv-dir # grep "waiting for scattering regions timeout" /tmp/source-kv-dir/tidb-lightning.log
[2023/09/10 13:25:51.238 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=76] [regions=78] [take=3m0.00075766s] []
[2023/09/10 13:30:47.410 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=98] [scatterCount=96] [regions=98] [take=3m0.017900034s] []
[2023/09/10 13:43:07.455 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=75] [regions=78] [take=3m0.00026953s] []
[2023/09/10 14:22:12.252 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=83] [regions=85] [take=3m0.000672683s] []
[2023/09/10 17:08:25.958 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=83] [regions=85] [take=3m0.000058922s] []
[2023/09/10 17:23:11.925 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=84] [regions=85] [take=3m0.000607642s] []
[2023/09/10 17:52:29.584 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=83] [regions=85] [take=3m0.001042661s] []
[2023/09/10 18:37:58.588 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=83] [regions=84] [take=3m0.000023387s] []
[2023/09/10 21:22:00.882 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=74] [regions=80] [take=3m0.000680588s] []
[2023/09/10 21:22:03.758 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=73] [regions=79] [take=3m0.000241391s] []
bash-5.1# grep "waiting for scattering regions timeout" /tmp/source-kv-dir/tidb-lightning.log
[2023/09/10 11:33:13.992 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=82] [regions=85] [take=3m0.000604681s] []
[2023/09/10 11:55:56.842 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=83] [regions=85] [take=3m0.001030399s] []
[2023/09/10 11:57:43.989 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=74] [regions=85] [take=3m0.00097823s] []
[2023/09/10 11:59:58.153 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=128] [scatterCount=114] [regions=128] [take=3m0.000545387s] []
[2023/09/10 12:01:41.961 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=120] [scatterCount=102] [regions=120] [take=3m0.000018753s] []
[2023/09/10 12:23:42.462 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=82] [regions=85] [take=3m0.000967645s] []
[2023/09/10 12:25:16.767 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=116] [scatterCount=111] [regions=116] [take=3m0.000225387s] []
[2023/09/10 13:30:23.029 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=76] [regions=78] [take=3m0.00038944s] []
[2023/09/10 13:31:47.379 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=74] [regions=78] [take=3m0.00050008s] []
[2023/09/10 13:34:58.145 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=111] [scatterCount=103] [regions=111] [take=3m0.000291007s] []
[2023/09/10 13:34:58.175 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=372] [regions=374] [take=3m0.00002505s] []
[2023/09/10 13:39:31.914 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=175] [scatterCount=174] [regions=175] [take=3m0.01859161s] []
[2023/09/10 13:50:05.740 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=77] [regions=78] [take=3m0.000474043s] []
[2023/09/10 15:14:49.244 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=84] [regions=85] [take=3m0.000967802s] []
[2023/09/10 16:59:06.770 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=73] [regions=83] [take=3m0.000893598s] []
[2023/09/10 23:42:33.131 +00:00] [INFO] [localhelper.go:341] ["waiting for scattering regions timeout"] [skipped_keys=0] [scatterCount=80] [regions=83] [take=3m0.00086205s] []

What version of PD are you using (pd-server -V)?

/ # /pd-server -V
Release Version: v6.5.0-nightly
Edition: Community
Git Commit Hash: 947701a
Git Branch: heads/refs/tags/v6.5.0-nightly
UTC Build Time: 2023-09-04 10:20:59

@fubinzh fubinzh added the type/bug The issue is confirmed as a bug. label Sep 14, 2023
@fubinzh
Copy link
Author

fubinzh commented Sep 14, 2023

Another testing with v6.5.4 TiDB cluster.

Use 4 lightning to parallel import 4 * 3TB data into cluster, one TiKV store size is 2.9TB, others are ~700GB.

b6da87b5-d989-4fa0-b24b-418732de8825

90797c8c-babb-43f8-a91b-6a7aece9f929

Cluster configuration as below (for other issue debug)

        pd:
          config: |
            [replication]
              max-replicas = 1
        tikv:
          config: |
            [import]
              num-threads = 12
            [coprocessor]
              region-max-keys = 240000
              region-max-size = "24M"
              region-split-keys = 160000
              region-split-size = "16M"

@jebter
Copy link
Collaborator

jebter commented Sep 21, 2023

/severity major

@ti-chi-bot ti-chi-bot added the affects-7.5 This bug affects the 7.5.x(LTS) versions. label Oct 23, 2023
@ti-chi-bot ti-chi-bot added the affects-8.1 This bug affects the 8.1.x(LTS) versions. label Apr 9, 2024
@fubinzh
Copy link
Author

fubinzh commented May 6, 2024

/remove-label affects-7.5
/remove-label affects-8.1

@ti-chi-bot ti-chi-bot bot removed affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. labels May 6, 2024
@fubinzh
Copy link
Author

fubinzh commented May 6, 2024

This issue is not seen after lighting import optimization in v7.0.

@fubinzh fubinzh closed this as completed May 6, 2024
@fubinzh
Copy link
Author

fubinzh commented May 6, 2024

/label affects-6.5

@ti-chi-bot ti-chi-bot bot added affects-6.5 This bug affects the 6.5.x(LTS) versions. and removed may-affects-6.5 labels May 6, 2024
@fubinzh
Copy link
Author

fubinzh commented May 6, 2024

/remove-label may-affects-7.1

@fubinzh
Copy link
Author

fubinzh commented May 6, 2024

/label affects-6.1
/label affects-5.4
/label affects-5.3

@ti-chi-bot ti-chi-bot bot added affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. labels May 6, 2024
Copy link
Contributor

ti-chi-bot bot commented May 6, 2024

@fubinzh: The label(s) affects-5.3 cannot be applied. These labels are supported: Hacktoberfest, challenge-program, ci-unstable, compatibility-breaker, high-performance, hptc, needs-cherry-pick-release-5.4, needs-cherry-pick-release-6.1, needs-cherry-pick-release-6.5, needs-cherry-pick-release-7.1, needs-cherry-pick-release-7.5, needs-cherry-pick-release-8.1, release-note, require-LGT1, wontfix, affects-5.4, affects-6.1, affects-6.5, affects-7.1, affects-7.5, affects-8.1, may-affects-5.4, may-affects-6.1, may-affects-6.5, may-affects-7.1, may-affects-7.5, may-affects-8.1.

In response to this:

/label affects-6.1
/label affects-5.4
/label affects-5.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@fubinzh
Copy link
Author

fubinzh commented May 6, 2024

/label affects-5.3

Copy link
Contributor

ti-chi-bot bot commented May 6, 2024

@fubinzh: The label(s) affects-5.3 cannot be applied. These labels are supported: Hacktoberfest, challenge-program, ci-unstable, compatibility-breaker, high-performance, hptc, needs-cherry-pick-release-5.4, needs-cherry-pick-release-6.1, needs-cherry-pick-release-6.5, needs-cherry-pick-release-7.1, needs-cherry-pick-release-7.5, needs-cherry-pick-release-8.1, release-note, require-LGT1, wontfix, affects-5.4, affects-6.1, affects-6.5, affects-7.1, affects-7.5, affects-8.1, may-affects-5.4, may-affects-6.1, may-affects-6.5, may-affects-7.1, may-affects-7.5, may-affects-8.1.

In response to this:

/label affects-5.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@fubinzh
Copy link
Author

fubinzh commented May 6, 2024

/remove-label may-affects-5.3

Copy link
Contributor

ti-chi-bot bot commented May 6, 2024

@fubinzh: The label(s) may-affects-5.3 cannot be applied. These labels are supported: Hacktoberfest, challenge-program, ci-unstable, compatibility-breaker, high-performance, hptc, needs-cherry-pick-release-5.4, needs-cherry-pick-release-6.1, needs-cherry-pick-release-6.5, needs-cherry-pick-release-7.1, needs-cherry-pick-release-7.5, needs-cherry-pick-release-8.1, release-note, require-LGT1, wontfix, affects-5.4, affects-6.1, affects-6.5, affects-7.1, affects-7.5, affects-8.1, may-affects-5.4, may-affects-6.1, may-affects-6.5, may-affects-7.1, may-affects-7.5, may-affects-8.1.

In response to this:

/remove-label may-affects-5.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. may-affects-5.3 severity/major type/bug The issue is confirmed as a bug.
Projects
Development

No branches or pull requests

3 participants