Skip to content
This repository has been archived by the owner on Aug 16, 2023. It is now read-only.

[Bug]: [Knowhere 1.3] index:gpu_ivf_sq8 crash #451

Open
elstic opened this issue Sep 2, 2022 · 3 comments
Open

[Bug]: [Knowhere 1.3] index:gpu_ivf_sq8 crash #451

elstic opened this issue Sep 2, 2022 · 3 comments

Comments

@elstic
Copy link
Contributor

elstic commented Sep 2, 2022

Current Behavior

Run all cases of L0 , index crashes of gpu_ivf_sq8 crash

Expected Behavior

case Normal execution

Steps To Reproduce

  1. Enter the ci pod
  2. run pytest -v -m L0

Test case address: https://github.com/milvus-io/knowhere-test/blob/main/tests/test_ivf.py#L234

Log


test_ivf.py::TestIvf::test_ivf_gpu[gpu_ivf_sq8] Fatal Python error: Segmentation fault

Current thread 0x00007f63abfb2740 (most recent call first):
 File "/usr/local/lib/python3.8/dist-packages/knowhere/swigknowhere.py", line 479 in Query
 File "/home/jenkins/agent/workspace/whh/tests/test_ivf.py", line 303 in test_ivf_gpu
 File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 192 in pytest_pyfunc_call
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 39 in _multicall
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in __call__
 File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 1761 in runtest
 File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 166 in pytest_runtest_call
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 39 in _multicall
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in __call__
 File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 259 in <lambda>
 File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 338 in from_call
 File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 258 in call_runtest_hook
 File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 219 in call_and_report
 File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 130 in runtestprotocol
 File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 111 in pytest_runtest_protocol
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 39 in _multicall
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in __call__
 File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 347 in pytest_runtestloop
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 39 in _multicall
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in __call__
 File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 322 in _main
 File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 268 in wrap_session
 File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 315 in pytest_cmdline_main
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 39 in _multicall
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in _hookexec
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in __call__
 File "/usr/local/lib/python3.8/dist-packages/_pytest/config/__init__.py", line 164 in main
 File "/usr/local/lib/python3.8/dist-packages/_pytest/config/__init__.py", line 187 in console_main
 File "/usr/local/bin/pytest", line 8 in <module>
Segmentation fault (core dumped)


@elstic elstic changed the title [Bug]: Knowhere 1.3 index:gpu_ivf_sq8 crash [Bug]: [Knowhere 1.3] index:gpu_ivf_sq8 crash Sep 2, 2022
@elstic
Copy link
Contributor Author

elstic commented Sep 5, 2022

image
image

find me the complete core dump file

@elstic
Copy link
Contributor Author

elstic commented Sep 5, 2022

/assign @Presburger

@Presburger Presburger pinned this issue Sep 5, 2022
@elstic
Copy link
Contributor Author

elstic commented Sep 5, 2022

update:
index: gpu_ivf_flat

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--
Core was generated by `python3.8 -m pytest test_ivf.py::TestIvf::test_ivf_gpu'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f1f0352b740 (LWP 1254))]
(gdb) bt
#0  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  <signal handler called>
#2  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#3  0x00007f1f03719859 in __GI_abort () at abort.c:79
#4  0x00007f1efea5aa42 in void faiss::gpu::runL2Norm<float, float4, int>(faiss::gpu::Tensor<float, 2, true, int, faiss::gpu::traits::DefaultPtrTraits>&, bool, faiss::gpu::Tensor<float, 1, true, int, faiss::gpu::traits::DefaultPtrTraits>&, bool, CUstream_st*) ()
   from /usr/local/lib/python3.8/dist-packages/knowhere/../../../libknowhere.so
#5  0x00007f1efea57ee7 in faiss::gpu::runL2Norm(faiss::gpu::Tensor<float, 2, true, int, faiss::gpu::traits::DefaultPtrTraits>&, bool, faiss::gpu::Tensor<float, 1, true, int, faiss::gpu::traits::DefaultPtrTraits>&, bool, CUstream_st*) () from /usr/local/lib/python3.8/dist-packages/knowhere/../../../libknowhere.so
#6  0x00007f1efe9eac51 in faiss::gpu::FlatIndex::add(float const*, int, CUstream_st*) ()
   from /usr/local/lib/python3.8/dist-packages/knowhere/../../../libknowhere.so
#7  0x00007f1efe9d2196 in faiss::gpu::GpuIndexFlat::addImpl_(int, float const*, long const*) ()
   from /usr/local/lib/python3.8/dist-packages/knowhere/../../../libknowhere.so
#8  0x00007f1efe9d2374 in faiss::gpu::GpuIndexFlat::add(long, float const*) () from /usr/local/lib/python3.8/dist-packages/knowhere/../../../libknowhere.so
#9  0x00007f1efe8c3b4e in faiss::Clustering::train_encoded (this=0x7ffc65105fa0, nx=262144, x_in=0x7f1e6647e010 "", codec=0x0, index=..., weights=0x0)
    at /home/jenkins/agent/knowhere/thirdparty/faiss/faiss/Clustering.cpp:566
#10 0x00007f1efe9da571 in faiss::gpu::GpuIndexIVF::trainQuantizer_(long, float const*) ()
   from /usr/local/lib/python3.8/dist-packages/knowhere/../../../libknowhere.so
#11 0x00007f1efe9dc758 in faiss::gpu::GpuIndexIVFFlat::train(long, float const*) () from /usr/local/lib/python3.8/dist-packages/knowhere/../../../libknowhere.so
#12 0x00007f1efe8a0b3b in knowhere::GPUIVF::Train (this=0x271e9d0, dataset_ptr=std::shared_ptr<class knowhere::Dataset> (use count 1, weak count 0) = {...}, 
    config=...) at /home/jenkins/agent/knowhere/knowhere/index/vector_index/gpu/IndexGPUIVF.cpp:44
#13 0x00007f1f0270841c in _wrap_GPUIVF_Train (args=<optimized out>) at /home/jenkins/agent/knowhere/python/../python/knowhere/knowhere_wrap.cpp:13349
#14 0x00000000005f69ca in PyCFunction_Call ()
#15 0x00000000005f74f6 in _PyObject_MakeTpCall ()
#16 0x0000000000571164 in _PyEval_EvalFrameDefault ()
#17 0x00000000005f6cd6 in _PyFunction_Vectorcall ()
#18 0x000000000056bbfa in _PyEval_EvalFrameDefault ()
#19 0x0000000000569dba in _PyEval_EvalCodeWithName ()
#20 0x00000000005f6eb3 in _PyFunction_Vectorcall ()
#21 0x000000000050bc2c in ?? ()

index: gpu_ivf_sq8
coredump bt log:

image

index: gpu_ivf_pq
coredump bt log:
image

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant