- Conversational Interface: Engage with the system using natural language queries to receive responses directly sourced from the PDFs.
- Direct Citation: Every response from the system includes a direct link to the source PDF page, ensuring traceability and verification.
- PDF Directory: A predefined set of key PDF documents, currently including UN guidelines on major health topics such as schistosomiasis and malaria.
The application utilizes a combination of OpenAI embeddings, Pinecone vector search, and a conversational interface to provide a seamless retrieval experience. When a query is made, the system:
- Converts the query into embeddings.
- Searches for the most relevant document sections using Pinecone's vector search.
- Returns the answer along with citations and links to the source documents.
-
Clone the repository:
git clone https://github.com/yourusername/RAG-nificent.git
-
Install dependencies:
pip install -r requirements.txt
-
Set environment variables in a
.env
(also see.env.example
file:PINECONE_INDEX_NAME
PINECONE_NAME_SPACE
OPENAI_API_KEY
PINECONE_API_KEY
-
Create a Pinecone index with the same name as
PINECONE_INDEX_NAME
. Set it up withdimensions=1536
andmetric=cosine
. -
Place your PDFs in the
pdf_data
directory and rundata_ingestion.py
-
Run the application:
chainlit run app.py
The system currently includes guidelines from the following PDFs with direct links to the documents:
- WHO guideline on control and elimination of human schistosomiasis (2022)
- WHO guidelines for malaria (2023)
This project is licensed under the MIT License.