Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Latest commit

 

History

History

whats-a-vector-database

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
description
...and what is it used for?

What's a vector database?

What is it?

A vector database is simply a database that has the capability to store vectors (pretty obvious, I know). For example, a vector database supporting vectors of 2 dimensions could store the following vectors: (1,2), (6,2), (7,2). That's it.

Why does it exist?

The primary use for a vector database is to perform similarity search. Given a specific vector in the database, usually referred to as the query vector, and an integer k, similarity search is the process of finding the k-nearest vectors to your query vector. When calculating how "close" two vectors are from each other, we can use one of the three similarity metrics supported in EigenDB.

Similarity metrics are mathematical methods of defining the distance between two vectors. You are probably most familiar with the Euclidean similarity metric.

  • Euclidean

$$ d(\vec{u}, \vec{v})=\sqrt{\sum_{i=1}^n (v_i - u_i)^2} $$

  • Cosine

$$ \cos(\theta) = \frac{\vec{u} \cdot \vec{v}}{|\vec{u}||\vec{v}|} = \frac{\sum_{i=1}^{n} u_i v_i}{\sqrt{\sum_{i=1}^{n} u_i^2} \sqrt{\sum_{i=1}^{n} v_i^2}} $$

  • Inner Product

$$ \vec{u} \cdot \vec{v} = \sum_{i=1}^n u_i v_i $$

For more information on how similarity search is performed in EigenDB, read this.

What can you do with similarity search?

TL;DR: Given a specific piece of data (audio, video, anything), similarity search is a method of efficiently finding similar pieces of data.

This page explains it in-depth. I highly recommend reading it to get a good understanding of similarity search.

Similarity search has many uses like: content-based recommendation engines, semantic search, image, video and text search, etc.

A great example of an active use of vector databases is Spotify's recommendation engine (check it out, it's really cool and open-source~ ❤️). Spotify is well-known for it amazing music recommendations. Music recommendations in Spotify are done with similarity search using a vector database. Songs in Spotify are vectorized (turned into vectors using machine learning) and stored in a vector database. This allows Spotify to easily compute the k-most similar songs to a target song, where all songs are represented as vectors.

There is a great episode of the NerdOut@Spotify podcast that talks about their recommendation engine and vector databases as a whole. Highly recommend it.