intelligent-support-agent

An intelligent support automation agent-based system using Llama 2, Qdrant vector database, and LangChain. Provides context-aware responses by combining semantic search with conversation history. Built for customer support automation, locally deployable, and adaptable to various domains.

Architecture

Request Flow

flowchart TB
    Client[Client] --> |Query| API
    API --> |Process Query| Support[Customer Support System]

    subgraph Context Management
        Support --> |Get Session Context| MCP[Model Context Protocol]
        MCP --> |Retrieve History| Sessions[(Session Store)]
        Support --> |Search Similar Docs| VDB[(Vector DB)]
    end

    subgraph Response Generation
        Support --> |Context + History| Chain[LangChain]
        Chain --> |Format Prompt| LLM[LLM]
        LLM --> |Generated Response| Chain
        Chain --> |Processed Response| Support
    end

    Support --> |Update Context| MCP
    Support --> |Final Response| API
    API --> |JSON Response| Client

Component Details

Vector Database (using Qdrant)
- Stores and indexes payment-related documentation
- Enables semantic search for relevant content
- Rust-based, open-source solution
LLM (Llama 2 via Ollama)
- Generates contextual responses
- Runs locally for data privacy and control
- Open-source model
Embeddings (sentence-transformers)
- Converts text into vector representations
- Uses HuggingFace's sentence-transformers
- Enables semantic similarity search
Model Context Protocol (MCP)
- Manages conversation context
- Handles user sessions
- Maintains conversation history
LangChain Integration
- Orchestrates the components
- Manages prompt templates
- Handles response generation

Project Context

This project was born from specific challenges in automating customer support for an online trading platform. The complete context is documented in context/problem-statement.md, which outlines:

Target Environment: Online Trading Platform (Deriv.com)
Primary Use Case: Automating responses to payment-related customer queries
Core Business Context: Handles online trading, deposits, and withdrawals

Key Challenges

High volume of payment-related support queries
Need for consistent and accurate responses
Complex context tracking across customer interactions
Multiple payment methods and failure scenarios

Technical Requirements

Local development and testing capability
Open-source tooling preference
Containerized deployment
Context-aware response generation

Understanding the System

To make our system's architecture more approachable, we use an analogy (documented in context/analogy.md) that compares it to a team of specialized helpers:

Librarian (Vector Database): Quickly finds relevant documentation
Smart Helper (LLM): Reads and answers questions
Secretary (MCP): Remembers conversation history
Manager (LangChain): Coordinates everyone's efforts

This analogy helps developers and AI models understand how different components interact and their specific roles in the system.

AI Model Context Files

The project includes two special files that help AI models (like LLMs) better understand and work with the system:

context/problem-statement.md
- Provides clear business context and requirements
- Helps AI models understand the purpose and scope
- Enables more relevant code suggestions and improvements
- Guides AI in maintaining project focus during development
context/analogy.md
- Offers intuitive explanations of system components
- Helps AI models generate more natural documentation
- Provides consistent metaphors for explaining functionality
- Makes technical concepts more approachable in AI-generated responses

Note: These files serve as crucial context for human developers and AI assistants, ensuring consistent understanding and communication about the system's architecture and purpose.

Project Structure

.
├── src/
│   ├── config.py               # Configuration settings
│   ├── main.py                 # FastAPI application
│   ├── vectorstore/            # Vector database integration
│   ├── mcp/                    # Model Context Protocol
│   └── llm_chain/              # LangChain integration
├── Dockerfile                  # Multi-stage build for the main app
├── Dockerfile.ollama           # Custom Ollama build with network utilities
├── Dockerfile.qdrant           # Custom Qdrant build with health check support
├── docker-compose.yml          # Container orchestration
├── requirements.txt            # Python dependencies
├── context                     # Project context to provide to GPT
│   ├── problem-statement.md    # Problem statement
│   └── analogy.md              # System explanation in easier language

Container Customizations

Main App Container (Dockerfile)
- Multi-stage build for optimized size
- Includes curl for health checks
- Runs as a non-root user for security
Custom Ollama Container (Dockerfile.ollama)
- Based on ollama/ollama
- Adds essential network utilities (curl, netcat, ping)
- Required for proper health checks and diagnostics
- Uses custom entrypoint to support both ollama and shell commands
- We use a custom Dockerfile for the Ollama container, including additional packages (curl, netcat, iputils-ping). This is necessary to ensure proper health checks and network diagnostics within the container. The default Ollama image lacks these utilities, which can cause issues with Docker health checks and network troubleshooting (see ollama/ollama#5389)
Custom Qdrant Container (Dockerfile.qdrant)
- Based on qdrant/qdrant
- Adds curl for health check functionality
- Maintains proper user permissions

Prerequisites

Docker
Python 3.11+
NVIDIA GPU (optional)
- For GPU support:
  1. Install NVIDIA Container Toolkit
  2. Uncomment GPU configuration in docker-compose.yml
- System will automatically use CPU if GPU is not available

NVIDIA Setup Guide

If you plan to use GPU acceleration, follow these steps for Ubuntu/Debian systems:

Verify NVIDIA GPU

# Check if NVIDIA GPU is detected
lspci | grep -i nvidia

Install NVIDIA Driver

# Check if NVIDIA driver is already installed
nvidia-smi

# If not installed:
sudo apt-get update
sudo apt-get install -y linux-headers-$(uname -r)
sudo apt-get install -y nvidia-driver-535

# Reboot system
sudo reboot

# After reboot, verify driver installation
nvidia-smi

Install NVIDIA Container Toolkit

# Add NVIDIA package repository and GPG key
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install the toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify Installation

# Test NVIDIA Container Toolkit
sudo docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Note: These instructions are for standard NVIDIA driver installation. For cloud-specific setups (AWS, GCP, Azure), refer to your cloud provider's documentation.

Official Documentation References:

Common Issues:

If nvidia-smi fails after installation, try rebooting the system
If Docker can't access GPU, ensure Docker service was restarted after configuration
For permission issues, ensure your user is in the docker group

Setup

Clone the repository:

git clone <repository-url>
cd intelligent-support-agent

Create a .env file:

QDRANT_HOST=localhost
QDRANT_PORT=6333
OLLAMA_BASE_URL=http://localhost:11434

Configure GPU support (optional):
- If you have an NVIDIA GPU and want to use it:
  1. Install NVIDIA Container Toolkit
  2. Uncomment the GPU configuration in docker-compose.yml:
```
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]
```
- If no GPU is available, the system will automatically use CPU
Start the services:
```
docker compose up -d
```
This will:
- Start all required services (app, qdrant, ollama)
- Automatically download the llama2 model
- Initialize the system
Note: If the model download fails or you need to manually pull the model:
```
docker compose exec ollama ollama pull llama2
```
Initialize the data:
```
docker compose --profile init up init-data
```
This will populate the vector database with payment documentation and sample queries.
Wait for all services to be ready:
```
# Check service status
docker compose ps
```
The API will be available at http://localhost:8000 once all services are healthy.

API Endpoints

API Usage Examples

Query Endpoint

POST /query
Content-Type: application/json

{
    "query": "Why did my payment fail?",
    "session_id": "optional-session-id",
    "context_size": 5
}

Example curl commands:

Basic query:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Why did my payment fail?"}'

Conversational queries using session ID:

# Initial query
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Why was my card payment declined?",
    "session_id": "user123"
  }'

# Follow-up query (uses previous context)
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What should I check first?",
    "session_id": "user123"
  }'

# Another follow-up
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I contact my bank about this?",
    "session_id": "user123"
  }'

Query with custom context size (limits conversation history):

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Why did my payment fail?",
    "session_id": "user123",
    "context_size": 3
  }'

Understanding Conversation Context

The system maintains conversation history using session IDs:

Session ID: A unique identifier for a conversation thread
- If not provided, each query is treated as independent
- When provided, enables follow-up questions using previous context
- Same session_id links queries into a conversation
Context Size: Number of previous exchanges to consider
- Default is 5 previous exchanges
- Can be adjusted per query using context_size parameter
- Smaller context size may improve response relevance
- Larger context size provides more conversation history

Example Conversation Flows:

Example 1 - Card Payment Issues:

# Initial query about card decline
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Why was my card payment declined?",
    "session_id": "user123"
  }'

# Follow-up about a specific issue
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I check my daily spending limits?",
    "session_id": "user123"
  }'

# Ask about troubleshooting
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Should I contact my card issuer?",
    "session_id": "user123"
  }'

Example 2 - Bank Transfer Issues:

# Initial query about bank transfer
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Why is my bank transfer delayed?",
    "session_id": "user456"
  }'

# Ask about the verification process
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What verification is needed for bank transfers?",
    "session_id": "user456"
  }'

# Follow-up about tracking
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How can I track my transfer status?",
    "session_id": "user456"
  }'

Example 3 - E-Wallet Processing:

# Ask about e-wallet transfer times
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How long do e-wallet transfers take?",
    "session_id": "user789"
  }'

# Follow-up about delays
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Why might my first transfer take longer?",
    "session_id": "user789"
  }'

# Ask about verification
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What security checks are needed for large transfers?",
    "session_id": "user789"
  }'

This conversation flow allows the system to:

Remember previous questions and answers
Understand context from earlier exchanges
Provide more relevant and contextual responses
Handle follow-up questions naturally

Example Response:

{
    "session_id": "user123",
    "response": "Based on the available information, your payment might have failed due to [detailed response from the system]"
}

Health Check

GET /health

Example curl command:

curl http://localhost:8000/health

Example Response:

{
    "status": "healthy"
}

Features

Semantic search for relevant documentation
Contextual response generation
Session management
Conversation history tracking
GPU acceleration support
Health monitoring
Comprehensive logging
Error handling
Docker-based deployment

Testing

The project includes a comprehensive test suite using pytest. Tests are containerized to ensure consistent test environments.

Testing Infrastructure

The project includes a comprehensive test suite using pytest, organized for clarity and maintainability:

Test Organization

tests/
├── conftest.py          # Shared fixtures and utilities
├── test_config.py       # Test configuration and markers
├── test_vector_store.py # Vector DB tests
├── test_mcp.py         # MCP tests
└── test_chain.py       # LangChain tests

Component Tests
- Vector Store: Collection management, embeddings, search functionality
  - Document indexing and retrieval
  - Embedding vector handling
  - Search result processing
- MCP (Model Context Protocol): Conversation management
  - Session handling and context tracking
  - Message history management
  - Concurrent session support
- LangChain: Integration and response generation
  - Chain initialization and configuration
  - Prompt template management
  - Response generation with context

Test Categories Tests are organized using pytest markers:

@pytest.mark.vectorstore  # Vector database tests
@pytest.mark.mcp         # Model Context Protocol tests
@pytest.mark.chain       # LangChain integration tests
@pytest.mark.unit        # Unit tests (default)
@pytest.mark.integration # Integration tests
@pytest.mark.slow        # Performance tests

Running Tests

# Run all tests
docker compose --profile test up test

# Run specific components
docker compose --profile test run test pytest -m vectorstore
docker compose --profile test run test pytest -m mcp
docker compose --profile test run test pytest -m chain

# Run by test type
docker compose --profile test run test pytest -m unit
docker compose --profile test run test pytest -m integration
docker compose --profile test run test pytest -m "not slow"

# Generate coverage reports
docker compose --profile test run test pytest --cov-report=term-missing
docker compose --profile test run test pytest --cov-report=html

Coverage Analysis
- Line and branch coverage tracking
- Missing code identification
- HTML reports in coverage_report/
- Continuous monitoring during development
Test Environment
- Containerized testing via Dockerfile.test
- Consistent test environment across systems
- Isolated from development dependencies
- Reproducible test execution

Contributing

Fork the repository
Create a feature branch
Commit changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
context		context
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.ollama		Dockerfile.ollama
Dockerfile.qdrant		Dockerfile.qdrant
Dockerfile.test		Dockerfile.test
LICENSE		LICENSE
README.md		README.md
blog.md		blog.md
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

intelligent-support-agent

Architecture

Request Flow

Component Details

Project Context

Key Challenges

Technical Requirements

Understanding the System

AI Model Context Files

Project Structure

Container Customizations

Prerequisites

NVIDIA Setup Guide

Setup

API Endpoints

API Usage Examples

Query Endpoint

Understanding Conversation Context

Health Check

Features

Testing

Testing Infrastructure

Contributing

License

About

Releases

Packages

Languages

License

deriv-com/intelligent-support-agent

Folders and files

Latest commit

History

Repository files navigation

intelligent-support-agent

Architecture

Request Flow

Component Details

Project Context

Key Challenges

Technical Requirements

Understanding the System

AI Model Context Files

Project Structure

Container Customizations

Prerequisites

NVIDIA Setup Guide

Setup

API Endpoints

API Usage Examples

Query Endpoint

Understanding Conversation Context

Health Check

Features

Testing

Testing Infrastructure

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages