Skip to content

Commit

Permalink
upload codes
Browse files Browse the repository at this point in the history
  • Loading branch information
llianga committed Jun 29, 2024
0 parents commit 9ae938d
Show file tree
Hide file tree
Showing 44 changed files with 7,360 additions and 0 deletions.
10 changes: 10 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Defines the Google C++ style for automatic reformatting.
# http://clang.llvm.org/docs/ClangFormatStyleOptions.html
BasedOnStyle: Google
MaxEmptyLinesToKeep: 1
# ColumnLimit: 0
FixNamespaceComments: true
AllowShortFunctionsOnASingleLine: All
# AlignConsecutiveDeclarations: true
AlignConsecutiveAssignments: true
BreakBeforeBraces: Allman
1 change: 1 addition & 0 deletions .env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
export PATH=$PATH:$HOME/bin:/mnt/pcz/env/cmake-3.24.1-linux-x86_64/bin
92 changes: 92 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
*.bin
*.time
*.png
*.csv

__pycache__
*.pyc
*.so
*.hdf5
.DS_Store
hnswlib.egg-info/
build/
dist/
tmp/
python_bindings/tests/__pycache__/
*.pyd
hnswlib.cpython*.so
var/
.idea/
.vscode/

benchmark/data

benchmark/index

benchmark/latest-results

benchmark/result
benchmark/results

benchmark/code/gather_statistics.py
benchmark/code/group1.py
benchmark/code/group2+3.py
benchmark/code/group3.py
benchmark/code/group4.py
benchmark/code/merge_selectivity.py
benchmark/code/index.sh
benchmark/code/plot.py
benchmark/code/preprocessing_selectivity.py
benchmark/code/search_ivfpq.py

benchmark/code/search_selectivity.sh
benchmark/code/search.sh
benchmark/code/selectivity_test.py



# Cmake files
cmake-build-*
CMakeCache.txt
CMakeFiles
CMakeScripts
Testing
# Makefile
cmake_install.cmake
install_manifest.txt
compile_commands.json
CTestTestfile.cmake

# Prerequisites
*.d

# Compiled Object files
*.slo
*.lo
*.o
*.obj

# Precompiled Headers
*.gch
*.pch

# Compiled Dynamic libraries
*.so
*.dylib
*.dll

# Fortran module files
*.mod
*.smod

# Compiled Static libraries
*.lai
*.la
*.a
*.lib

# Executables
*.exe
*.out
*.app
*.exe
31 changes: 31 additions & 0 deletions ALGO_PARAMS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# HNSW algorithm parameters

## Search parameters:
* ```ef``` - the size of the dynamic list for the nearest neighbors (used during the search). Higher ```ef```
leads to more accurate but slower search. ```ef``` cannot be set lower than the number of queried nearest neighbors
```k```. The value ```ef``` of can be anything between ```k``` and the size of the dataset.
* ```k``` number of nearest neighbors to be returned as the result.
The ```knn_query``` function returns two numpy arrays, containing labels and distances to the k found nearest
elements for the queries. Note that in case the algorithm is not be able to find ```k``` neighbors to all of the queries,
(this can be due to problems with graph or ```k```>size of the dataset) an exception is thrown.

An example of tuning the parameters can be found in [TESTING_RECALL.md](TESTING_RECALL.md)

## Construction parameters:
* ```M``` - the number of bi-directional links created for every new element during construction. Reasonable range for ```M```
is 2-100. Higher ```M``` work better on datasets with high intrinsic dimensionality and/or high recall, while low ```M``` work
better for datasets with low intrinsic dimensionality and/or low recalls. The parameter also determines the algorithm's memory
consumption, which is roughly ```M * 8-10``` bytes per stored element.
As an example for ```dim```=4 random vectors optimal ```M``` for search is somewhere around 6, while for high dimensional datasets
(word embeddings, good face descriptors), higher ```M``` are required (e.g. ```M```=48-64) for optimal performance at high recall.
The range ```M```=12-48 is ok for the most of the use cases. When ```M``` is changed one has to update the other parameters.
Nonetheless, ef and ef_construction parameters can be roughly estimated by assuming that ```M```*```ef_{construction}``` is
a constant.

* ```ef_construction``` - the parameter has the same meaning as ```ef```, but controls the index_time/index_accuracy. Bigger
ef_construction leads to longer construction, but better index quality. At some point, increasing ef_construction does
not improve the quality of the index. One way to check if the selection of ef_construction was ok is to measure a recall
for M nearest neighbor search when ```ef``` =```ef_construction```: if the recall is lower than 0.9, than there is room
for improvement.
* ```num_elements``` - defines the maximum number of elements in the index. The index can be extened by saving/loading(load_index
function has a parameter which defines the new maximum number of elements).
38 changes: 38 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
cmake_minimum_required (VERSION 2.8.12)
project(hnsw_lib LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 17)

# if (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
# SET( CMAKE_CXX_FLAGS "-Ofast -DNDEBUG -std=c++11 -DHAVE_CXX0X -openmp -fpic -ftree-vectorize")
# else()
# if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
# SET( CMAKE_CXX_FLAGS "-Ofast -DNDEBUG -std=c++11 -DHAVE_CXX0X -openmp -march=native -fpic -ftree-vectorize")
# elseif (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
# SET( CMAKE_CXX_FLAGS "-Ofast -lrt -DNDEBUG -std=c++11 -DHAVE_CXX0X -march=native -fpic -w -fopenmp -ftree-vectorize -ftree-vectorizer-verbose=0" )
# elseif (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")
# SET( CMAKE_CXX_FLAGS "-Ofast -lrt -DNDEBUG -std=c++11 -DHAVE_CXX0X -openmp -march=native -fpic -w -fopenmp -ftree-vectorize" )
# endif()
# endif()

# add_executable(test_updates examples/updates_test.cpp)
# target_link_libraries(test_updates hnswlib)

# add_executable(searchKnnCloserFirst_test examples/searchKnnCloserFirst_test.cpp)
# target_link_libraries(searchKnnCloserFirst_test hnswlib)

# add_executable(main main.cpp sift_1b.cpp)
# target_link_libraries(main hnswlib)

# add_subdirectory(googletest)

include_directories(.)

file(GLOB TESTS tests/*.cpp)
FOREACH (path ${TESTS})
get_filename_component(name ${path} NAME_WE)
add_executable(${name} ${path})
ENDFOREACH ()



2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
include hnswlib/*.h
include LICENSE
15 changes: 15 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
pypi: dist
twine upload dist/*

dist:
-rm dist/*
pip install build
python3 -m build --sdist

test:
python3 -m unittest discover --start-directory python_bindings/tests --pattern "*_test*.py"

clean:
rm -rf *.egg-info build dist tmp var tests/__pycache__ hnswlib.cpython*.so

.PHONY: dist
Loading

0 comments on commit 9ae938d

Please sign in to comment.