Chat with your PDF. Make summary. Ask relevant questions
This is a short project I did to explore the LLM and RAG domain.
Tested on Mistral 7B models.
- Create environment
conda create -n chatPDF python=3.10
- Install poetry
pip install poetry
- Then run
poetry install
- Poetry will not install llama-cpp-python
Usepip install llama-cpp-python
to install the CPU version. If you want to use CUDA, follow the steps in llama-cpp-python - Download the model and save it in the
model
directory - You can use various models and embeddings. Just go to
model.py
and change the model and embeddings as you like. - Make sure to keep your PDF in
Scanned_Documents
- Go to
chat_PDF_Mistral7B.py
orchat_PDF_OpenAI.py
and update the prompt.
- Fix the vector database calling
- Inefficient Architecture: Need to fix the database area.
Loading the files in the database might be a better option. Might increase the speed. But if for only one time use, I think this architecture is fine. If user wants to check on the data again in the future, the cold start approach will be an issue. - Metrics to check the text generation quality by using various RAG techniques.
hit_rate
andmrr
- All the code running on CPU. But still it's fast. Too lazy to reinstall
llama_cpp_python
using CUBLAS. - Run the code using arguments
- Run the code asynchronously
- Sometimes the sentence is not complete proving that there's a limit on Mistral 7B.
- Used Mistral7B but llamaindex still needs OPENAI API keys to run some function like
VectorStoreIndex
- So, reloading the database is a bit of challenge with this code because of the OPENAI requests even if I am not using OPENAI!!!