Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: query failed with error service internal error: target version mismatch after querynode pod failure chaos test #37902

Open
1 task done
zhuwenxing opened this issue Nov 21, 2024 · 5 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

zhuwenxing commented Nov 21, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20241120-7ba85504-amd64
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior


[2024-11-20T09:13:47.476Z] <name>: Checker__HyvbZlCy

[2024-11-20T09:13:47.476Z] <description>: 

[2024-11-20T09:13:47.476Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}......  (api_request.py:37)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:40 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:43 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:43 - DEBUG - ci_test]: (api_request)  : [Collection.compact] args: [False, 180], kwargs: {} (api_request.py:62)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:43 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:44 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:51 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:51 - INFO - ci_test]: assert create collection: 0.013302326202392578, init_entities: 194352 (test_all_collections_after_chaos.py:49)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:54 - DEBUG - ci_test]: (api_request)  : [Collection.insert] args: [[{'int64': -3000, 'float': 0.1309169, 'varchar': 'gocyi', 'text': 'Too goal close go. Discussion many hot practice former. Full risk notice chance bit seat.\nReport music pressure cut nature. Doctor by mind according issue under middle.\nAlone cold rule.', 'json_field': {'name': 'Meghan Gutierrez',......, kwargs: {'timeout': 180} (api_request.py:62)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:55 - DEBUG - ci_test]: (api_response) : (insert count: 2000, delete count: 0, upsert count: 0, timestamp: 454058011668250625, success count: 2000, err count: 0  (api_request.py:37)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:55 - INFO - ci_test]: assert insert: 1.4796831607818604 (test_all_collections_after_chaos.py:57)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:10:55 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:12 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:12 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:31 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:31 - INFO - ci_test]: assert flush: 16.213988065719604, entities: 196352 (test_all_collections_after_chaos.py:67)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:31 - INFO - ci_test]: index info: [{'collection': 'Checker__HyvbZlCy', 'field': 'int64', 'index_name': 'int64', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__HyvbZlCy', 'field': 'float', 'index_name': 'float', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__HyvbZlCy', 'field': 'varchar', 'index_name': 'varchar', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__HyvbZlCy', 'field': 'text', 'index_name': 'text', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__HyvbZlCy', 'field': 'float_vector', 'index_name': 'float_vector', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__HyvbZlCy', 'field': 'image_emb', 'index_name': 'image_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__HyvbZlCy', 'field': 'voice_emb', 'index_name': 'voice_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__HyvbZlCy', 'field': 'text_sparse_emb', 'index_name': 'text_sparse_emb', 'index_param': {'index_type': 'SPARSE_INVERTED_INDEX', 'metric_type': 'BM25', 'params': {'bm25_k1': 1.5, 'bm25_b': 0.75}}}] (test_all_collections_after_chaos.py:71)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:31 - INFO - ci_test]: index info: [{'collection': 'Checker__HyvbZlCy', 'field': 'float_vector', 'index_name': 'float_vector', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__HyvbZlCy', 'field': 'image_emb', 'index_name': 'image_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__HyvbZlCy', 'field': 'voice_emb', 'index_name': 'voice_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__HyvbZlCy', 'field': 'text_sparse_emb', 'index_name': 'text_sparse_emb', 'index_param': {'index_type': 'SPARSE_INVERTED_INDEX', 'metric_type': 'BM25', 'params': {'bm25_k1': 1.5, 'bm25_b': 0.75}}}, {'collection': 'Checker__HyvbZlCy', 'field': 'int64', 'index_name': 'int64', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__HyvbZlCy', 'field': 'float', 'index_name': 'float', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__HyvbZlCy', 'field': 'varchar', 'index_name': 'varchar', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__HyvbZlCy', 'field': 'text', 'index_name': 'text', 'index_param': {'index_type': 'INVERTED'}}] (test_all_collections_after_chaos.py:86)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:31 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 1, 180], kwargs: {} (api_request.py:62)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:31 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:31 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.038520660424703645, 0.02706883160875182, 0.11491212846102333, 0.1251422882317869, 0.15336852226710182, 0.05243946725633598, 0.04315610567137004, 0.019213663000552206, 0.038231163385241455, 0.1427314018911522, 0.034942312377116536, 0.10038952282770104, 0.13653709172576833, 0.006019849111047318, ......, kwargs: {} (api_request.py:62)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:52 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=503, message=failed to search: service internal error: target version mismatch, collection: 454057490543884941, channel: by-dev-rootcoord-dml_8_454057490543884941v1,  current target version: 1732093296790336169, leader version: 0: channel not available[channel=by-dev-rootcoord-dml_8_454057490543884941v1])>, <Time:{'RPC start': '2024-11-20 09:11:31.582876', 'RPC error': '2024-11-20 09:11:52.599095'}> (decorators.py:140)

[2024-11-20T09:13:47.476Z] [2024-11-20 09:11:52 - ERROR - ci_test]: Traceback (most recent call last):

[2024-11-20T09:13:47.476Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 32, in inner_wrapper

[2024-11-20T09:13:47.476Z]     res = func(*args, **_kwargs)

[2024-11-20T09:13:47.476Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 63, in api_request

[2024-11-20T09:13:47.476Z]     return func(*arg, **kwargs)

[2024-11-20T09:13:47.476Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 801, in search

[2024-11-20T09:13:47.476Z]     resp = conn.search(

[2024-11-20T09:13:47.476Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 141, in handler

[2024-11-20T09:13:47.476Z]     raise e from e

[2024-11-20T09:13:47.476Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 137, in handler

[2024-11-20T09:13:47.476Z]     return func(*args, **kwargs)

[2024-11-20T09:13:47.476Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 176, in handler

[2024-11-20T09:13:47.476Z]     return func(self, *args, **kwargs)

[2024-11-20T09:13:47.476Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 116, in handler

[2024-11-20T09:13:47.476Z]     raise e from e

[2024-11-20T09:13:47.476Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 86, in handler

[2024-11-20T09:13:47.476Z]     return func(*args, **kwargs)

[2024-11-20T09:13:47.476Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 806, in search

[2024-11-20T09:13:47.476Z]     return self._execute_search(request, timeout, round_decimal=round_decimal, **kwargs)

[2024-11-20T09:13:47.476Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 747, in _execute_search

[2024-11-20T09:13:47.476Z]     raise e from e

[2024-11-20T09:13:47.476Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 736, in _execute_search

[2024-11-20T09:13:47.477Z]     check_status(response.status)

[2024-11-20T09:13:47.477Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/utils.py", line 63, in check_status

[2024-11-20T09:13:47.477Z]     raise MilvusException(status.code, status.reason, status.error_code)

[2024-11-20T09:13:47.477Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=503, message=failed to search: service internal error: target version mismatch, collection: 454057490543884941, channel: by-dev-rootcoord-dml_8_454057490543884941v1,  current target version: 1732093296790336169, leader version: 0: channel not available[channel=by-dev-rootcoord-dml_8_454057490543884941v1])>

[2024-11-20T09:13:47.477Z]  (api_request.py:45)

[2024-11-20T09:13:47.477Z] [2024-11-20 09:11:52 - ERROR - ci_test]: (api_response) : <MilvusException: (code=503, message=failed to search: service internal error: target version mismatch, collection: 454057490543884941, channel: by-dev-rootcoord-dml_8_454057490543884941v1,  current target version: 1732093296790336169, leader version: 0: channel not available[channel=by-dev-rootcoor...... (api_request.py:46)

[2024-11-20T09:13:47.477Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/19549/pipeline
log:
artifacts-querynode-pod-failure-19549-server-logs.tar.gz

[2024-11-20T09:10:07.229Z] + kubectl get pods -o wide

[2024-11-20T09:10:07.230Z] + grep querynode-pod-failure-19549

[2024-11-20T09:10:07.487Z] querynode-pod-failure-19549-etcd-0                                1/1     Running            0                33m     10.104.18.193   4am-node25   <none>           <none>

[2024-11-20T09:10:07.487Z] querynode-pod-failure-19549-etcd-1                                1/1     Running            0                33m     10.104.34.13    4am-node37   <none>           <none>

[2024-11-20T09:10:07.487Z] querynode-pod-failure-19549-etcd-2                                1/1     Running            0                33m     10.104.26.154   4am-node32   <none>           <none>

[2024-11-20T09:10:07.487Z] querynode-pod-failure-19549-milvus-datanode-b9bb7b756-52zkd       1/1     Running            3 (32m ago)      33m     10.104.16.254   4am-node21   <none>           <none>

[2024-11-20T09:10:07.487Z] querynode-pod-failure-19549-milvus-datanode-b9bb7b756-nxvml       1/1     Running            3 (32m ago)      33m     10.104.4.128    4am-node11   <none>           <none>

[2024-11-20T09:10:07.487Z] querynode-pod-failure-19549-milvus-indexnode-547cb684f-92fjr      1/1     Running            3 (32m ago)      33m     10.104.30.73    4am-node38   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-milvus-indexnode-547cb684f-sbm9r      1/1     Running            3 (32m ago)      33m     10.104.32.127   4am-node39   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-milvus-indexnode-547cb684f-t5c5g      1/1     Running            3 (32m ago)      33m     10.104.20.180   4am-node22   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-milvus-mixcoord-565dd7b6b7-ttbrv      1/1     Running            3 (32m ago)      33m     10.104.32.125   4am-node39   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-milvus-proxy-55f46d47-9x4xd           1/1     Running            3 (32m ago)      33m     10.104.32.126   4am-node39   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-milvus-querynode-77c888d8f9-54vls     1/1     Running            7 (9m18s ago)    33m     10.104.14.162   4am-node18   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-milvus-querynode-77c888d8f9-8qnft     1/1     Running            7 (9m4s ago)     33m     10.104.17.171   4am-node23   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-milvus-querynode-77c888d8f9-ztqpq     1/1     Running            8 (8m16s ago)    33m     10.104.4.127    4am-node11   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-minio-0                               1/1     Running            0                33m     10.104.18.191   4am-node25   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-minio-1                               1/1     Running            0                33m     10.104.34.12    4am-node37   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-minio-2                               1/1     Running            0                33m     10.104.26.153   4am-node32   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-minio-3                               1/1     Running            0                33m     10.104.19.138   4am-node28   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-pulsar-bookie-0                       1/1     Running            0                33m     10.104.18.192   4am-node25   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-pulsar-bookie-1                       1/1     Running            0                33m     10.104.34.14    4am-node37   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-pulsar-bookie-init-4vmng              0/1     Completed          0                33m     10.104.18.176   4am-node25   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-pulsar-broker-0                       1/1     Running            0                33m     10.104.18.178   4am-node25   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-pulsar-proxy-0                        1/1     Running            0                33m     10.104.14.163   4am-node18   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-pulsar-pulsar-init-jtpts              0/1     Completed          0                33m     10.104.18.175   4am-node25   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-pulsar-recovery-0                     1/1     Running            0                33m     10.104.18.177   4am-node25   <none>           <none>

[2024-11-20T09:10:07.488Z] querynode-pod-failure-19549-pulsar-zookeeper-0                    1/1     Running            0                33m     10.104.18.190   4am-node25   <none>           <none>

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 21, 2024
@zhuwenxing
Copy link
Contributor Author

zhuwenxing commented Nov 21, 2024

#37765
This error also occurred in this issue.
However, it was not reproducible during verification, so it was closed.
This new issue has been opened to track this.

The key point of issue #37765 is that the querynode restarted during verification, and in this issue, only an error was reported without any indication of a restart.

@yanliang567
Copy link
Contributor

/assign @weiliu1031
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 21, 2024
@yanliang567 yanliang567 added this to the 2.5.0 milestone Nov 21, 2024
@yanliang567
Copy link
Contributor

@liliu-z please take a look at this issue as well

@weiliu1031
Copy link
Contributor

should be fixed by #37909

@yanliang567
Copy link
Contributor

/assign @zhuwenxing
please help to verify the fix
/unassign @weiliu1031

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants