Regulatory-Compliance-Advisor

This project is dedicated to the development of an AI-powered advisory system that assists businesses in navigating the complex and ever-evolving landscape of industry regulations. The system is designed to offer actionable compliance guidance by leveraging a Retrieval-Augmented Generation (RAG) framework, ensuring that users stay informed about the latest regulatory changes.

Key Features

1. Retrieval-Augmented Generation (RAG) Setup

Our system employs a RAG architecture to maintain an up-to-date understanding of regulatory changes. The system provides precise and relevant compliance guidance, helping businesses stay compliant across multiple jurisdictions.

2. Cost-Effectiveness and Proprietary Models

While proprietary models like those from OpenAI offer high performance, they can be cost-prohibitive, especially for continuous large-scale monitoring. Our solution addresses this by balancing performance with cost-effectiveness, making it viable for sustained operations in large-scale regulatory environments.

3. Data Privacy and Security

Our approach mitigates the risks associated with using proprietary models, ensuring that sensitive data is handled with the highest standards of security.

4. Domain-Specific Fine-Tuning

Generic responses can undermine the effectiveness of AI in specialized domains. To combat this, our system incorporates domain-specific fine-tuning, ensuring that the generated advice is both relevant and accurate for specific sectors like Finance, Healthcare, and Data Privacy.

5. Diverse Data Sources

Our system ingests information from a variety of sources, including:

PDF Documents
Wikipedia Pages
Websites This diverse input ensures a comprehensive understanding of regulatory landscapes across regions such as the US, European Union, and India.

Technical Framework

RAG Architecture

Data Preparation:
- Embeddings: OpenAI Embedding Model
- VectorDB: PineCone
Generation: GPT-4o-Mini
Evaluation: RAGAS
Deployment: Docker
User Interface: Gradio

Vector Database Configuration

Chunk Size: 500
Indices: 3
Retrieval Search Algorithm: Similarity Search
Retrieved Chunks: 20
Generation Output Tokens: 512
Temperature: 0

Open-Source Pipeline

We also developed a secondary architecture based entirely on open-source technologies:

Embedding Model: Stella
VectorDB: Qdrant
Generation Model: Llama 3.1 8B (finetuned and non-finetuned variants)
Challenges: The finetuned Llama model produced suboptimal outputs, leading us to revert to the non-finetuned Llama 3.1 8B Instruct model for generation tasks.

Prompt Data Format

Context Chunks: 20 (each of 512 tokens)
Relevant Chunks: 4 (random order)
Components: User Query, Generated Answer

Finetuning Process

Model: Llama 3.1 8B
Techniques: LoRA Finetuning, LoRA Adaptor merging
Final Model: Merged using SLEPR with the Llama 3.1 8B Instruct model

Evaluation

RAG Evaluation Process

Data Generation: Utilized the Groq API to generate query-answer pairs from the Llama-3.1-70B-instruct model.
Chunk Size: 2048 tokens per chunk
Generated Pairs: 10 per chunk, yielding 30 query-answer pairs each for the Data Privacy, Healthcare, and Finance departments.
Hallucination Control: Removed outputs exceeding the 99th percentile length threshold.

RAG Pipeline Evaluation Metrics

Context Precision
Faithfulness
Answer Relevancy
Context Recall

Discussion of Results

Performance: GPT-4o-mini outperformed Llama-3.1 in most evaluation metrics.
Retrieval Efficiency: Stella and OpenAI Embedding models performed comparably in retrieval tasks.
Response Time: GPT-4o-mini was 5x faster than Llama-3.1, making it the preferred deployment model with both OpenAI and Stella embeddings.

Deployment

Implemented a user-friendly interface using Gradio.
Encapsulated the application in a Docker container, and pushed it to Docker Hub for easy deployment.

Run Using Docker Setup

1. Pull the Docker image:

docker pull adi1710/rag-gradio-app:v2

2. Run the Docker Container

docker run --gpus all -p 7860:7860 -e OPENAI_API_KEY=your_openai_api_key adi1710/rag-gradio-app:v2

3. Access the Application:

Once the container is running, you can access the application through your web browser at http://localhost:7860.

Conclusion and Future Recommendations

We have successfully deployed a RAG pipeline capable of providing regulatory compliance guidance tailored to various sectors. This system enables businesses to navigate complex regulations, helping them avoid legal penalties, maintain their reputations, and operate smoothly across different regions.

Future Enhancements

Automated Updates: Implement regular update checks to ensure the database remains current.
Embedding Finetuning: Further finetuning the embedding model to improve the quality of generated answers.

This repository provides a robust foundation for businesses seeking to ensure compliance with industry regulations, offering a scalable, secure, and efficient solution for regulatory navigation.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
Docker		Docker
pdf_data		pdf_data
LICENSE		LICENSE
README.md		README.md
app.py		app.py
capstone_regulatory_compliance_advisor.pptx		capstone_regulatory_compliance_advisor.pptx
data_preparation_vdb.ipynb		data_preparation_vdb.ipynb
data_priv_ragas.csv		data_priv_ragas.csv
data_privacy.csv		data_privacy.csv
data_privacy_qa_pdfs.csv		data_privacy_qa_pdfs.csv
finance.csv		finance.csv
finance2.csv		finance2.csv
finance_ragas.csv		finance_ragas.csv
health.csv		health.csv
health_ragas.csv		health_ragas.csv
qdrant_app.py		qdrant_app.py
qdrant_requirements.txt		qdrant_requirements.txt
qdrant_source_pdf.py		qdrant_source_pdf.py
qdrant_source_url.py		qdrant_source_url.py
qdrant_source_wikipedia.py		qdrant_source_wikipedia.py
ragas.csv		ragas.csv
referneces.txt		referneces.txt
requirements.txt		requirements.txt
source_pdf.py		source_pdf.py
source_url.py		source_url.py
source_wikipedia.py		source_wikipedia.py
stella_with_qdrant.ipynb		stella_with_qdrant.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regulatory-Compliance-Advisor

Key Features

1. Retrieval-Augmented Generation (RAG) Setup

2. Cost-Effectiveness and Proprietary Models

3. Data Privacy and Security

4. Domain-Specific Fine-Tuning

5. Diverse Data Sources

Technical Framework

RAG Architecture

Vector Database Configuration

Open-Source Pipeline

Prompt Data Format

Finetuning Process

Evaluation

RAG Evaluation Process

RAG Pipeline Evaluation Metrics

Discussion of Results

Deployment

Run Using Docker Setup

1. Pull the Docker image:

2. Run the Docker Container

3. Access the Application:

Conclusion and Future Recommendations

Future Enhancements

About

Releases

Packages

Contributors 4

Languages

License

adi7820/Regulatory-Compliance-Advisor

Folders and files

Latest commit

History

Repository files navigation

Regulatory-Compliance-Advisor

Key Features

1. Retrieval-Augmented Generation (RAG) Setup

2. Cost-Effectiveness and Proprietary Models

3. Data Privacy and Security

4. Domain-Specific Fine-Tuning

5. Diverse Data Sources

Technical Framework

RAG Architecture

Vector Database Configuration

Open-Source Pipeline

Prompt Data Format

Finetuning Process

Evaluation

RAG Evaluation Process

RAG Pipeline Evaluation Metrics

Discussion of Results

Deployment

Run Using Docker Setup

1. Pull the Docker image:

2. Run the Docker Container

3. Access the Application:

Conclusion and Future Recommendations

Future Enhancements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages