Handson workshop for building a semantic search engine.
If you came to this repo, during a workshop visit this custom jupyter hub with all the dependencies already set up.
The repo is located at npatta01/search-engine-workshop
To use this repo outside a workshop, please use Binder
Data Fetching
setup notebook
stats notebook
sample image notebook
Notebooks to download unsplash dataset and save as hugging face dataset format
Non Deep Learning Retrieval
BM25 retrieval with elastic search: notebook
Deep Learning Retrieval (text)
Text Deep Learning retrieval: Link
Deep Learning Retrieval (image)
Clip Retrieval: Link
ANN
Shows how to speed up Deep Learning retrieval by exploring different ANN indexes Link
For help or feedback, please reach out to :
This workshop uses Unsplash Lite Dataset 1.2.0 link
The hands on portion of the workshop was made possible due to JupyterHub Helm Chart
v1.1
- setup for PyDataNYC
- replaced stackoverflow data with unsplash data
v1.0
- setup for ODSC
- used stackoverflow data