A very simple RAG implementation

This project is a simple RAG tool for asking questions related to some vnexpress articles.

This project is to demonstrate how RAG can be easily implemented without buzzy frameworks such as LangChain or LLamaIndex. Therefor, people can integrate RAG into their own system/project.

This projects uses:

Scrapy for getting plain text of articles from vnexpress giao duc tin tuc
Mistral Platform for both embedding and language models. They are mistral-embed and open-mistral-nemo
Upstash Vector for vector database

Setup

Python 3.10

Install required packages, please see requirements.txt for extra information

pip install -r requirements.txt

Mistral API key and Upstash API key are stored at .env

MISTRAL_API_KEY=<key_here>
UPSTASH_VECTOR_REST_URL=<key_here>
UPSTASH_VECTOR_REST_TOKEN=<key_here>

Run

It is strongly advised to reach each .py file before running any command. By doing so, you get to understand the project more.

Scrap the data

At root project

scrapy runspider --set FEED_EXPORT_ENCODING=utf-8 src/vnexpress_spider.py -o data/articles.jsonl

Scrapy won't overwrite data/articles.jsonl if it already exists. If you want new data, you have to delete the file.

Setup the database

At root project

python src/setup_db.py

If a vector database already exists and data/articles.jsonl changes, you should delete the database.

Run the tool

You should definitely edit query variable in src/rag.py.

At root project

python src/rag.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A very simple RAG implementation

Setup

Run

Scrap the data

Setup the database

Run the tool

Great researcher/developer-friendly RAG frameworks

About

Releases

Packages

Languages

License

dinhanhx/cakewalk-rag

Folders and files

Latest commit

History

Repository files navigation

A very simple RAG implementation

Setup

Run

Scrap the data

Setup the database

Run the tool

Great researcher/developer-friendly RAG frameworks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages