Developed by Anjali Smith (as6467) and Shreeya Patel (sjp2236) for COMS4115: PLT Programming Assignment 3
RAGLang (Retrieval-Augmented Generation Language) is a programming language that combines structured data retrieval (using SQL-like queries) with natural language to output a prompt that can be sent to large language models (LLMs) for Retrieval-Augmented Generation tasks.
To execute our script, run the following commands:
cd lexer
python3 run.py
.
If you have Python installed, you can execute our script by running the following command:
cd parser
python3 run.py
.
Otherwise, if you don't have Python installed, we have set up a Dockerfile. Use the following instructions to build and run the Docker container:
- Install Docker Desktop from Docker's official website.
- In the root directory, run
docker build -t raglang-parser .
- Once its built, run the container with:
docker run --rm raglang-parser
If you have Python installed, you can execute our script by running the following command:
python3 run.py
SQLite is typically pre-installed on many systems, but you can check by running:
sqlite3 --version
If it's not installed, you can install it:
Ubuntu/Debian:
sudo apt-get install sqlite3
MacOS (via Homebrew):
brew install sqlite
Windows: Download SQLite from the official website and follow the installation instructions.
To open the database, run sqlite3 <name of database>
ie, sqlite3 sales-2024.db
To see all the tables in your database, run .tables
To see the schema of a specific table, run .schema <table name>
To exit, run .exit
- Input:
RETRIEVE:
SOURCE: "sales-2024.db"
QUERY: SELECT name, email FROM customers WHERE active = "true";
LIMIT: 10
GENERATE:
PROMPT: "Create a different thank you message for each customer."
- Parser Output (AST):
'Program'
'R'
'Keyword': 'RETRIEVE'
'Source'
'Keyword': 'SOURCE'
'String': 'sales-2024.db'
'Q'
'Keyword': 'QUERY'
'Keyword': 'SELECT'
'A'
'List'
'Identifier': 'name'
'Identifier': 'email'
'Keyword': 'FROM'
'Identifier': 'customers'
'Where'
'Keyword': 'WHERE'
'Condition'
'Identifier': 'active'
'Operator': '='
'Value': 'true'
'L'
'Keyword': 'LIMIT'
'Number': '10'
'G'
'Keyword': 'GENERATE'
'Prompt'
'Keyword': 'PROMPT'
'String': 'Create a different thank you message for each customer.'
- RagLang Output:
====================================================================================================
π RAGLANG OUTPUT π
Copy and paste the following text into the AI assistant of your choice (e.g., ChatGPT, Claude)
====================================================================================================
Please respond to the following prompt using the provided data.
Data:
- John Doe, [email protected]
- Jane Smith, [email protected]
- Michael Brown, [email protected]
- Emma Wilson, [email protected]
- Aaron Lilac, [email protected]
- Mike Roger, [email protected]
- Izzy Tike, [email protected]
Prompt: Create a different thank you message for each customer.
- Input:
RETRIEVE:
SOURCE: "sales-2024.db"
QUERY: SELECT product, total_sales FROM sales_data
WHERE year = 2023 AND total_sales > 4000 AND region = "North America";
GENERATE:
PROMPT: "Write catchy descriptions of each of our top-selling products in 2023 for North America that we can use on our social media page."
- Parser Output (AST):
AST: 'Program'
'R'
'Keyword': 'RETRIEVE'
'Source'
'Keyword': 'SOURCE'
'String': 'sales-2024.db'
'Q'
'Keyword': 'QUERY'
'Keyword': 'SELECT'
'A'
'List'
'Identifier': 'product'
'Identifier': 'total_sales'
'Keyword': 'FROM'
'Identifier': 'sales_data'
'Where'
'Keyword': 'WHERE'
'D'
'D'
'Condition'
'Identifier': 'year'
'Operator': '='
'Value': '2023'
'Keyword': 'AND'
'Condition'
'Identifier': 'total_sales'
'Operator': '>'
'Value': '4000'
'Keyword': 'AND'
'Condition'
'Identifier': 'region'
'Operator': '='
'Value': 'North America'
'G'
'Keyword': 'GENERATE'
'Prompt'
'Keyword': 'PROMPT'
'String': 'Write catchy descriptions of each of our top-selling products in 2023 for North America that we can use on our social media page.'
- RagLang Output:
====================================================================================================
π RAGLANG OUTPUT π
Copy and paste the following text into the AI assistant of your choice (e.g., ChatGPT, Claude)
====================================================================================================
Please respond to the following prompt using the provided data.
Data:
- Barbie Doll, 5000
- Coffee Mug, 4500
- Sneakers, 6000
- Headphones, 5000
- Suitcase, 4500
- Soap, 6000
Prompt: Write catchy descriptions of each of our top-selling products in 2023 for North America that we can use on our social media page.
RETRIEVE:
SOURCE: "sales-2024.db"
QUERY: SELECT product_name FROM inventory WHERE stock < 25;
LIMIT: 5
GENERATE:
PROMPT: "Create a 2-month marketing campaign to advertise our lowest selling products, which will be 25% off during Black Friday"
Parser Output:
'Program'
'R'
'Keyword': 'RETRIEVE'
'Source'
'Keyword': 'SOURCE'
'String': 'sales-2024.db'
'Q'
'Keyword': 'QUERY'
'Keyword': 'SELECT'
'A'
'List'
'Identifier': 'product_name'
'Keyword': 'FROM'
'Identifier': 'inventory'
'Where'
'Keyword': 'WHERE'
'Condition'
'Identifier': 'stock'
'Operator': '<'
'Value': '25'
'L'
'Keyword': 'LIMIT'
'Number': '5'
'G'
'Keyword': 'GENERATE'
'Prompt'
'Keyword': 'PROMPT'
'String': 'Create a 2-month marketing campaign to advertise our lowest selling products, which will be 25% off during Black Friday'
- RagLang Output
====================================================================================================
π RAGLANG OUTPUT π
Copy and paste the following text into the AI assistant of your choice (e.g., ChatGPT, Claude)
====================================================================================================
Please respond to the following prompt using the provided data.
Data:
- Vitamins
- Lamp
- Charger
Prompt: Create a 2-month marketing campaign to advertise our lowest selling products, which will be 25% off during Black Friday
- Input
RETRIEVE:
SOURCE: "sales-2024.db"\nQUERY: SELECT product_id FROM sales_data WHERE quantity > 10;LIMIT: 5
GENERATE:
PROMPT: "Summarize sales data to generate a motivational message that I can use to deliver to our salespeople for their hardwork this quarter."
- Parser Output:
'Program'
'R'
'Keyword': 'RETRIEVE'
'Source'
'Keyword': 'SOURCE'
'String': 'sales-2024.db'
'Q'
'Keyword': 'QUERY'
'Keyword': 'SELECT'
'A'
'List'
'Identifier': 'product_id'
'Keyword': 'FROM'
'Identifier': 'sales_data'
'Where'
'Keyword': 'WHERE'
'Condition'
'Identifier': 'quantity'
'Operator': '>'
'Value': '10'
'L'
'Keyword': 'LIMIT'
'Number': '5'
'G'
'Keyword': 'GENERATE'
'Prompt'
'Keyword': 'PROMPT'
'String': 'Summarize sales data to generate a motivational message that I can use to deliver to our salespeople for their hardwork this quarter.'
- Output
====================================================================================================
π RAGLANG OUTPUT π
====================================================================================================
Error in SQL query no such column: product_id
- Input
RETRIEVE:
SOURCE: "sales-2024.db"
SELECT customer_id, purchase_amount FROM transactions WHERE purchase_amount > 100 AND loyalty_status = "Gold";
GENERATE:
PROMPT: "Summarize high-value transactions"
- Output:
Syntax error: Expected KEYWORD 'QUERY' at position 5
AST: None