This project implements a Retrieval Augmented Generation (RAG) Engine using Streamlit. The application performs retrieval augmented generation on arXiv articles focusing on Software Engineering and Programming Language topics.
- Web-based interface built with Streamlit
- Retrieval of relevant documents from a PostgreSQL database
- Generation of responses using a language model
- Customizable number of documents to retrieve
- Adjustable token size for generated responses
- Python 3.10
- Streamlit
- langchain
- langchain_postgres
- sentence_transformers
- psycopg2
-
Clone the repository:
git clone https://github.com/AlvinKimata/RAG-project cd RAG-project
-
Install the required packages:
pip install -r requirements.txt
-
Set up the PostgreSQL database with the arXiv documents.
Ensure you have the correct PostgreSQL connection details in the connection
variable:
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain"
-
Run the Streamlit app:
streamlit run app.py
-
Open your web browser and navigate to the provided local URL (usually
http://localhost:8501
). -
Use the sliders to adjust the number of documents to retrieve and the token size for generation.
-
Enter your query in the text input field and click 'Submit'.
-
The app will retrieve relevant documents and generate a response based on your query.
app.py
: Main Streamlit application filerag_engine.py
: Contains functions for document retrieval and prompt generationapi.py
: Handles the interaction with the language model API
query_llm()
: Queries the language model with the generated promptsimilarity_search()
: Performs similarity search to retrieve relevant documentsdocument_template()
: Formats the retrieved documentsrag_function()
: Performs the RAG processgenerate_prompt()
: Generates the prompt for the language model
This application uses a pre-trained language model and a pre-populated database of arXiv articles. Ensure you have the necessary API access and database set up correctly for the application to function properly.