Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Querynode panic with error SIGSEGV: segmentation violation during run full text search #36992

Closed
1 task done
zhuwenxing opened this issue Oct 18, 2024 · 7 comments
Closed
1 task done
Assignees
Labels
feature/full text search kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-8669153-20241018
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024/10/18 07:50:04.463 +00:00] [DEBUG] [segments/search.go:112] ["search growing/sealed segments without indexes"] [traceID=1e20f7fef506593389c144088755dcce] [segmentIDs="[453309165438875145,453309165438843386,453309165438849778,453309165438855947,453309165438862244,453309165438871693,453309165438868556,453309165438871730,453309165438846648,453309165438852832]"]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [delegator/delegator.go:309] ["Delegator search done"] [traceID=1e20f7fef506593389c144088755dcce] [collectionID=453309165438240205] [channel=full-text-search-v1018-rootcoord-dml_1_453309165438240205v1] [replicaID=453309165599457281]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [querynodev2/services.go:711] [tr/searchSegments] [traceID=1e20f7fef506593389c144088755dcce] [msg="search segments done, channel = full-text-search-v1018-rootcoord-dml_1_453309165438240205v1, segmentIDs = [453309165438875145 453309165438843386 453309165438849778 453309165438855947 453309165438862244 453309165438871693 453309165438868556 453309165438871730 453309165438846648 453309165438852832]"] [duration=2.509173ms]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [delegator/delegator.go:309] ["Delegator search done"] [traceID=1e20f7fef506593389c144088755dcce] [collectionID=453309165438240205] [channel=full-text-search-v1018-rootcoord-dml_1_453309165438240205v1] [replicaID=453309165599457281]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [segments/segment.go:551] ["search segment done"] [traceID=d6b182af5dda699543de2532870800dd] [collectionID=453309165438240205] [segmentID=453309165438871730] [segmentType=Growing] [withIndex=false]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [segments/search.go:112] ["search growing/sealed segments without indexes"] [traceID=d6b182af5dda699543de2532870800dd] [segmentIDs="[453309165438875145,453309165438843386,453309165438849778,453309165438855947,453309165438862244,453309165438871693,453309165438868556,453309165438871730,453309165438846648,453309165438852832]"]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [querynodev2/services.go:711] [tr/searchSegments] [traceID=d6b182af5dda699543de2532870800dd] [msg="search segments done, channel = full-text-search-v1018-rootcoord-dml_1_453309165438240205v1, segmentIDs = [453309165438875145 453309165438843386 453309165438849778 453309165438855947 453309165438862244 453309165438871693 453309165438868556 453309165438871730 453309165438846648 453309165438852832]"] [duration=2.744887ms]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [delegator/delegator.go:309] ["Delegator search done"] [traceID=d6b182af5dda699543de2532870800dd] [collectionID=453309165438240205] [channel=full-text-search-v1018-rootcoord-dml_1_453309165438240205v1] [replicaID=453309165599457281]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [querynodev2/handlers.go:390] [tr/searchDelegator] [traceID=d6b182af5dda699543de2532870800dd] [msg="start reduce query result, traceID = d6b182af5dda699543de2532870800dd,  vChannel = full-text-search-v1018-rootcoord-dml_1_453309165438240205v1, segmentIDs = []"] [duration=314.58865ms]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [querynodev2/handlers.go:404] [tr/searchDelegator] [traceID=d6b182af5dda699543de2532870800dd] [msg="do search with channel done , vChannel = full-text-search-v1018-rootcoord-dml_1_453309165438240205v1, segmentIDs = []"] [duration=314.604038ms]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [segments/segment.go:551] ["search segment done"] [traceID=1e20f7fef506593389c144088755dcce] [collectionID=453309165438240205] [segmentID=453309165438871730] [segmentType=Growing] [withIndex=false]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [segments/search.go:112] ["search growing/sealed segments without indexes"] [traceID=1e20f7fef506593389c144088755dcce] [segmentIDs="[453309165438875145,453309165438843386,453309165438849778,453309165438855947,453309165438862244,453309165438871693,453309165438868556,453309165438871730,453309165438846648,453309165438852832]"]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [querynodev2/services.go:711] [tr/searchSegments] [traceID=1e20f7fef506593389c144088755dcce] [msg="search segments done, channel = full-text-search-v1018-rootcoord-dml_1_453309165438240205v1, segmentIDs = [453309165438875145 453309165438843386 453309165438849778 453309165438855947 453309165438862244 453309165438871693 453309165438868556 453309165438871730 453309165438846648 453309165438852832]"] [duration=2.955928ms]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [delegator/delegator.go:309] ["Delegator search done"] [traceID=1e20f7fef506593389c144088755dcce] [collectionID=453309165438240205] [channel=full-text-search-v1018-rootcoord-dml_1_453309165438240205v1] [replicaID=453309165599457281]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [querynodev2/handlers.go:390] [tr/searchDelegator] [traceID=1e20f7fef506593389c144088755dcce] [msg="start reduce query result, traceID = 1e20f7fef506593389c144088755dcce,  vChannel = full-text-search-v1018-rootcoord-dml_1_453309165438240205v1, segmentIDs = []"] [duration=319.267669ms]
[2024/10/18 07:50:04.463 +00:00] [DEBUG] [querynodev2/handlers.go:404] [tr/searchDelegator] [traceID=1e20f7fef506593389c144088755dcce] [msg="do search with channel done , vChannel = full-text-search-v1018-rootcoord-dml_1_453309165438240205v1, segmentIDs = []"] [duration=319.367228ms]
[2024/10/18 07:50:04.508 +00:00] [DEBUG] [querynodev2/services.go:71] ["QueryNode current state"] [NodeID=21] [StateCode=Healthy]
_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7_M_dataEv
	/usr/include/c++/12/bits/basic_string.h:234 pc=0x7f79b80e2d9e
_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_M_is_localEv
	/usr/include/c++/12/bits/basic_string.h:275 pc=0x7f79b80e2d9e
_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_
	/usr/include/c++/12/bits/basic_string.h:855 pc=0x7f79b80e2d9e
_ZN8knowhere6Config14FormatAndCheckERKS0_RN8nlohmann16json_abi_v3_11_210basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNS4_14adl_serializerES7_IhSaIhEEEEPSD_
	/go/src/github.com/milvus-io/milvus/cmake_build/thirdparty/knowhere/knowhere-src/src/common/config.cc:100 pc=0x7f79b80e2d9e
_ZN8knowhere10LoadConfigEPNS_10BaseConfigERKN8nlohmann16json_abi_v3_11_210basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNS3_14adl_serializerES6_IhSaIhEEEENS_10PARAM_TYPEERKSC_PSC_
	/go/src/github.com/milvus-io/milvus/cmake_build/thirdparty/knowhere/knowhere-src/src/index/index.cc:33 pc=0x7f79b8236c65
_ZN8knowhere5IndexINS_9IndexNodeEE5BuildESt10shared_ptrINS_7DataSetEERKN8nlohmann16json_abi_v3_11_210basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNS7_14adl_serializerESA_IhSaIhEEEE
	/go/src/github.com/milvus-io/milvus/cmake_build/thirdparty/knowhere/knowhere-src/src/index/index.cc:77 pc=0x7f79b8236c65
_ZN6milvus5index14VectorMemIndexIfE16BuildWithDatasetERKSt10shared_ptrIN8knowhere7DataSetEERKN8nlohmann16json_abi_v3_11_210basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNSA_14adl_serializerESD_IhSaIhEEEE
	/go/src/github.com/milvus-io/milvus/internal/core/src/index/VectorMemIndex.cpp:274 pc=0x7f79bbe11365
_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7_M_dataEv
	/usr/include/c++/12/bits/basic_string.h:234 pc=0x7f79b80e2d9e
_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_M_is_localEv
	/usr/include/c++/12/bits/basic_string.h:275 pc=0x7f79b80e2d9e
_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_
	/usr/include/c++/12/bits/basic_string.h:855 pc=0x7f79b80e2d9e
_ZN8knowhere6Config14FormatAndCheckERKS0_RN8nlohmann16json_abi_v3_11_210basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNS4_14adl_serializerES7_IhSaIhEEEEPSD_
	/go/src/github.com/milvus-io/milvus/cmake_build/thirdparty/knowhere/knowhere-src/src/common/config.cc:100 pc=0x7f79b80e2d9e
_ZN8knowhere10LoadConfigEPNS_10BaseConfigERKN8nlohmann16json_abi_v3_11_210basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNS3_14adl_serializerES6_IhSaIhEEEENS_10PARAM_TYPEERKSC_PSC_
	/go/src/github.com/milvus-io/milvus/cmake_build/thirdparty/knowhere/knowhere-src/src/index/index.cc:33 pc=0x7f79b8236c65
_ZN8knowhere5IndexINS_9IndexNodeEE5BuildESt10shared_ptrINS_7DataSetEERKN8nlohmann16json_abi_v3_11_210basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNS7_14adl_serializerESA_IhSaIhEEEE
	/go/src/github.com/milvus-io/milvus/cmake_build/thirdparty/knowhere/knowhere-src/src/index/index.cc:77 pc=0x7f79b8236c65
_ZN6milvus5index14VectorMemIndexIfE16BuildWithDatasetERKSt10shared_ptrIN8knowhere7DataSetEERKN8nlohmann16json_abi_v3_11_210basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNSA_14adl_serializerESD_IhSaIhEEEE
	/go/src/github.com/milvus-io/milvus/internal/core/src/index/VectorMemIndex.cpp:274 pc=0x7f79bbe11365
_ZN6milvus7segcore17SegmentSealedImpl22generate_interim_indexEN6fluent9NamedTypeIlNS_4impl10FieldIdTagEJNS2_10ComparableENS2_8HashableEEEE
	/go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:2028 pc=0x7f79bbf64a29
_ZN6milvus7segcore17SegmentSealedImpl13LoadFieldDataEN6fluent9NamedTypeIlNS_4impl10FieldIdTagEJNS2_10ComparableENS2_8HashableEEEERNS_13FieldDataInfoE
	/go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:475 pc=0x7f79bbf65491
_ZN6milvus7segcore17SegmentSealedImpl13LoadFieldDataERK17LoadFieldDataInfo
	/go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:283 pc=0x7f79bbf5b520
_ZN6milvus7segcore17SegmentSealedImpl22generate_interim_indexEN6fluent9NamedTypeIlNS_4impl10FieldIdTagEJNS2_10ComparableENS2_8HashableEEEE
	/go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:2028 pc=0x7f79bbf64a29
_ZN6milvus7segcore17SegmentSealedImpl13LoadFieldDataEN6fluent9NamedTypeIlNS_4impl10FieldIdTagEJNS2_10ComparableENS2_8HashableEEEERNS_13FieldDataInfoE
	/go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:475 pc=0x7f79bbf65491
_ZN6milvus7segcore17SegmentSealedImpl13LoadFieldDataERK17LoadFieldDataInfo
	/go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:283 pc=0x7f79bbf5b520
LoadFieldData
	/go/src/github.com/milvus-io/milvus/internal/core/src/segcore/segment_c.cpp:344 pc=0x7f79bbfd5809
_cgo_371eee69d3b2_Cfunc_LoadFieldData
	/tmp/go-build/cgo-gcc-prolog:319 pc=0x5df6524
runtime.asmcgocall
	/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:918 pc=0x1fc25c3


SIGSEGV: segmentation violation
PC=0x7f79b80e2d9e m=8 sigcode=1 addr=0x0
signal arrived during cgo execution

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

log.log

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 18, 2024
@zhuwenxing zhuwenxing added feature/full text search priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. labels Oct 18, 2024
@zhuwenxing
Copy link
Contributor Author

/assign @zhengbuqian
/assign @aoiasd
PTAL

@zhengbuqian zhengbuqian added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 18, 2024
@zhengbuqian zhengbuqian added this to the 2.5.0 milestone Oct 18, 2024
@zhengbuqian
Copy link
Collaborator

zhengbuqian commented Oct 18, 2024

this is likely caused by: https://github.com/zilliztech/knowhere/pull/888/files#r1806322881.

but it is weird that this caused the program to crash, as any thrown exceptions in segcore should have bene caught, and should not have resulted in a segfault

@zhengbuqian
Copy link
Collaborator

image

ah the err_msg is nullptr, no wonder

@zhengbuqian
Copy link
Collaborator

@zhengbuqian
Copy link
Collaborator

will be fixed by zilliztech/knowhere#903

@zhengbuqian
Copy link
Collaborator

fix has been adopted to Milvus in #37000, please verify

/assign @zhuwenxing
/unassign

@zhuwenxing
Copy link
Contributor Author

verified and fixed in master-346510e-20241021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/full text search kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants