Download the project from Github and extract the watchtower
folder.
git clone [email protected]:gamekeepers/watchtower.git
- Grobid: for research paper pdf parsing
- Qdrant: for vector search
- sqlite: for storing pdf parsed content and chat history
-
Install Grobid
Following command runs Grobid in a docker container and exposes api on port 8080.sudo docker run -d --rm --init --ulimit core=0 -p 8080:8070 grobid/grobid:0.8.0
-
Install Qdrant
Following command runs Qdrant in a docker container and exposes api on port 6333.# create persistent qdrant storage directory mkdir -p ~/.watchtower/qdrant_storage sudo docker run -d -p 6333:6333 qdrant/qdrant:latest sudo docker run -d -p 6333:6333 \ -v ~/.watchtower/qdrant_storage:/qdrant/storage:z \ qdrant/qdrant
Qdrant dashboard can be accessed at http://localhost:6333/dashboard
-
Install sqlite
sudo apt-get install sqlite3
-
Install python dependencies
pip install -r requirements.txt
We support several LLM providers.
OpenAI, Azure OpenAI, Bedrock LLM, AWS Config, Vertex AI, Mistral AI, Cohere
To use one of them, you need to set the LLM_TYPE
environment variable. For example:
The following sub-sections define the configuration requirements of OpenAI.
To use OpenAI LLM, you will need to provide the OpenAI key via OPENAI_API_KEY
environment variable:
export LLM_TYPE=openai
export OPENAI_API_KEY=...
You can get your OpenAI key from the OpenAI dashboard.
With the environment variables set, you can run the following commands to start the server and frontend.
- Python 3.8+
- Node 14+
For Python we recommend using a virtual environment.
ℹ️ Here's a good primer on virtual environments from Real Python.
# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
source .venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
# Install Node dependencies
cd frontend && yarn && cd ..
Copy env.sample .env file
cp env.sample .env
Populate .env
file
cd api && python3 -u manage.py set-data-stores && cd ..
This will create a sqlite database and a qdrant collection for use by app
cd api && python3 -u manage.py index-data-from-directory /path/to/pdf/files && cd ..
Ensure both Grobid and Qdrant services are up.
This will store parsed content in sqlite database and index the vectors in Qdrant.
By default, this will index the data into the literature-docs
index. You can change this by setting the QDRANT_COLLECTION
environment variable.
# Launch API app
flask run
# In a separate terminal launch frontend app
cd frontend && yarn start
You can now access the frontend at http://localhost:3000. Changes are automatically reloaded.