FastKnn

Purpose

Provide a lib to create a fast kNN index and get results as a pandas dataframe FastKnn use mainly nmslib as (fast) kNN backend

Install

pip install git+https://github.com/Fanchouille/fastknn.git

Use

FastKnn builds a kNN index with specified index_method (default: hnsw) and index_space (default: cosinesimil)

See here for different spaces
See here for different methods

This code has been tested with hnsw method and cosinesimil / l2 space for dense data and cosinesimil_sparse / cosinesimil_sparse_fast space

Example with dense data:

from fastknn import FastKnn

# Create index...
fastknn = FastKnn(data, id_dict)

# Save index
fastknn.save("test_fastknn")

# ...or load if exists
fastknn = FastKnn(fastknn_folder="test_fastknn")

# Choose sample vectors
query = data[:3, :]

# Query index & get results as df
results_df = fastknn.query_as_df(query, k=10, same_ids=True, remove_identity=True)

Where data is a m x n numpy array matrix and id_dict is a python dictionary with mappings from integer index (0 to m-1) to real ids
- fastknn.datautils provides method to get data and id_dict easily from pandas dataframes
To use FastKnn in supervised mode, provide a target parameter which is a python dictionary containing labels (classes or quantity target) related to data (default: None: unsupervised mode)
Other important parameters: data_type (default: dense) and dist_type (default: float) - see main.py for examples
Once instantiated, save method saves as files:
- mappings from integer index to real ids as a json file
- index parameters as a json file
- index as a bin file
- target dictionary as a json file
Get a saved FastKnn back by specifying fastknn_folder
Query a FastKnn object by using query_as_df provided method with the following parameters
- query - p x n numpy array - matrix to be matched to data
- k - integer - the number of nearest neighbours (default 10)
- query_index - list of integer - index of the data provided in query (default: None - takes row index as index)
- nn_column - string - name of resulting column containing the nearest neighbours (default: nearest_neighbours)
- distance_column - string - name of resulting column containing the distances to nearest neighbours (default: distances)
- same_ids - bool - when querying the same data that was indexed, gets index + real ids (default: False)
- remove_identity - bool - when querying the same data that was indexed, get k nearest neighbours without the perfect identity match (default: False)
Get prediction with a FastKnn object by using prediction_as_df provided method with the following parameters
- query - p x n numpy array - matrix to be matched to data
- k - integer - the number of nearest neighbours (default 10)
- query_index - list of integer - index of the data provided in query (default: None - takes row index as index)
- same_ids - bool - when querying the same data that was indexed, gets index + real ids (default: False)
- remove_identity - bool - when querying the same data that was indexed, get k nearest neighbours without the perfect identity match (default: False)
- prediction_type - string - classification (majority voting on the k nearest neighbours) or regression (mean on the k nearest neighbours)(default: classification)

Development

Clone project

Install Anaconda local environment as below:

./install.sh

Activate Anaconda local environment as below:

conda activate ${PWD}/.conda

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
fastknn		fastknn
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
environment.yml		environment.yml
install.sh		install.sh
main.py		main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastKnn

Purpose

Install

Use

Development

About

Releases

Packages

Languages

Fanchouille/fastknn

Folders and files

Latest commit

History

Repository files navigation

FastKnn

Purpose

Install

Use

Development

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages