-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 9ae938d
Showing
44 changed files
with
7,360 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Defines the Google C++ style for automatic reformatting. | ||
# http://clang.llvm.org/docs/ClangFormatStyleOptions.html | ||
BasedOnStyle: Google | ||
MaxEmptyLinesToKeep: 1 | ||
# ColumnLimit: 0 | ||
FixNamespaceComments: true | ||
AllowShortFunctionsOnASingleLine: All | ||
# AlignConsecutiveDeclarations: true | ||
AlignConsecutiveAssignments: true | ||
BreakBeforeBraces: Allman |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
export PATH=$PATH:$HOME/bin:/mnt/pcz/env/cmake-3.24.1-linux-x86_64/bin |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
*.bin | ||
*.time | ||
*.png | ||
*.csv | ||
|
||
__pycache__ | ||
*.pyc | ||
*.so | ||
*.hdf5 | ||
.DS_Store | ||
hnswlib.egg-info/ | ||
build/ | ||
dist/ | ||
tmp/ | ||
python_bindings/tests/__pycache__/ | ||
*.pyd | ||
hnswlib.cpython*.so | ||
var/ | ||
.idea/ | ||
.vscode/ | ||
|
||
benchmark/data | ||
|
||
benchmark/index | ||
|
||
benchmark/latest-results | ||
|
||
benchmark/result | ||
benchmark/results | ||
|
||
benchmark/code/gather_statistics.py | ||
benchmark/code/group1.py | ||
benchmark/code/group2+3.py | ||
benchmark/code/group3.py | ||
benchmark/code/group4.py | ||
benchmark/code/merge_selectivity.py | ||
benchmark/code/index.sh | ||
benchmark/code/plot.py | ||
benchmark/code/preprocessing_selectivity.py | ||
benchmark/code/search_ivfpq.py | ||
|
||
benchmark/code/search_selectivity.sh | ||
benchmark/code/search.sh | ||
benchmark/code/selectivity_test.py | ||
|
||
|
||
|
||
# Cmake files | ||
cmake-build-* | ||
CMakeCache.txt | ||
CMakeFiles | ||
CMakeScripts | ||
Testing | ||
# Makefile | ||
cmake_install.cmake | ||
install_manifest.txt | ||
compile_commands.json | ||
CTestTestfile.cmake | ||
|
||
# Prerequisites | ||
*.d | ||
|
||
# Compiled Object files | ||
*.slo | ||
*.lo | ||
*.o | ||
*.obj | ||
|
||
# Precompiled Headers | ||
*.gch | ||
*.pch | ||
|
||
# Compiled Dynamic libraries | ||
*.so | ||
*.dylib | ||
*.dll | ||
|
||
# Fortran module files | ||
*.mod | ||
*.smod | ||
|
||
# Compiled Static libraries | ||
*.lai | ||
*.la | ||
*.a | ||
*.lib | ||
|
||
# Executables | ||
*.exe | ||
*.out | ||
*.app | ||
*.exe |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# HNSW algorithm parameters | ||
|
||
## Search parameters: | ||
* ```ef``` - the size of the dynamic list for the nearest neighbors (used during the search). Higher ```ef``` | ||
leads to more accurate but slower search. ```ef``` cannot be set lower than the number of queried nearest neighbors | ||
```k```. The value ```ef``` of can be anything between ```k``` and the size of the dataset. | ||
* ```k``` number of nearest neighbors to be returned as the result. | ||
The ```knn_query``` function returns two numpy arrays, containing labels and distances to the k found nearest | ||
elements for the queries. Note that in case the algorithm is not be able to find ```k``` neighbors to all of the queries, | ||
(this can be due to problems with graph or ```k```>size of the dataset) an exception is thrown. | ||
|
||
An example of tuning the parameters can be found in [TESTING_RECALL.md](TESTING_RECALL.md) | ||
|
||
## Construction parameters: | ||
* ```M``` - the number of bi-directional links created for every new element during construction. Reasonable range for ```M``` | ||
is 2-100. Higher ```M``` work better on datasets with high intrinsic dimensionality and/or high recall, while low ```M``` work | ||
better for datasets with low intrinsic dimensionality and/or low recalls. The parameter also determines the algorithm's memory | ||
consumption, which is roughly ```M * 8-10``` bytes per stored element. | ||
As an example for ```dim```=4 random vectors optimal ```M``` for search is somewhere around 6, while for high dimensional datasets | ||
(word embeddings, good face descriptors), higher ```M``` are required (e.g. ```M```=48-64) for optimal performance at high recall. | ||
The range ```M```=12-48 is ok for the most of the use cases. When ```M``` is changed one has to update the other parameters. | ||
Nonetheless, ef and ef_construction parameters can be roughly estimated by assuming that ```M```*```ef_{construction}``` is | ||
a constant. | ||
|
||
* ```ef_construction``` - the parameter has the same meaning as ```ef```, but controls the index_time/index_accuracy. Bigger | ||
ef_construction leads to longer construction, but better index quality. At some point, increasing ef_construction does | ||
not improve the quality of the index. One way to check if the selection of ef_construction was ok is to measure a recall | ||
for M nearest neighbor search when ```ef``` =```ef_construction```: if the recall is lower than 0.9, than there is room | ||
for improvement. | ||
* ```num_elements``` - defines the maximum number of elements in the index. The index can be extened by saving/loading(load_index | ||
function has a parameter which defines the new maximum number of elements). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
cmake_minimum_required (VERSION 2.8.12) | ||
project(hnsw_lib LANGUAGES CXX) | ||
|
||
set(CMAKE_CXX_STANDARD 17) | ||
|
||
# if (${CMAKE_SYSTEM_NAME} MATCHES "Darwin") | ||
# SET( CMAKE_CXX_FLAGS "-Ofast -DNDEBUG -std=c++11 -DHAVE_CXX0X -openmp -fpic -ftree-vectorize") | ||
# else() | ||
# if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang") | ||
# SET( CMAKE_CXX_FLAGS "-Ofast -DNDEBUG -std=c++11 -DHAVE_CXX0X -openmp -march=native -fpic -ftree-vectorize") | ||
# elseif (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") | ||
# SET( CMAKE_CXX_FLAGS "-Ofast -lrt -DNDEBUG -std=c++11 -DHAVE_CXX0X -march=native -fpic -w -fopenmp -ftree-vectorize -ftree-vectorizer-verbose=0" ) | ||
# elseif (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC") | ||
# SET( CMAKE_CXX_FLAGS "-Ofast -lrt -DNDEBUG -std=c++11 -DHAVE_CXX0X -openmp -march=native -fpic -w -fopenmp -ftree-vectorize" ) | ||
# endif() | ||
# endif() | ||
|
||
# add_executable(test_updates examples/updates_test.cpp) | ||
# target_link_libraries(test_updates hnswlib) | ||
|
||
# add_executable(searchKnnCloserFirst_test examples/searchKnnCloserFirst_test.cpp) | ||
# target_link_libraries(searchKnnCloserFirst_test hnswlib) | ||
|
||
# add_executable(main main.cpp sift_1b.cpp) | ||
# target_link_libraries(main hnswlib) | ||
|
||
# add_subdirectory(googletest) | ||
|
||
include_directories(.) | ||
|
||
file(GLOB TESTS tests/*.cpp) | ||
FOREACH (path ${TESTS}) | ||
get_filename_component(name ${path} NAME_WE) | ||
add_executable(${name} ${path}) | ||
ENDFOREACH () | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
include hnswlib/*.h | ||
include LICENSE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
pypi: dist | ||
twine upload dist/* | ||
|
||
dist: | ||
-rm dist/* | ||
pip install build | ||
python3 -m build --sdist | ||
|
||
test: | ||
python3 -m unittest discover --start-directory python_bindings/tests --pattern "*_test*.py" | ||
|
||
clean: | ||
rm -rf *.egg-info build dist tmp var tests/__pycache__ hnswlib.cpython*.so | ||
|
||
.PHONY: dist |
Oops, something went wrong.