This repository contains the code for generating SPARQL queries from natural language text using the chain of thought approach.
To run the system locally, follow one of the methods below based on your preferences and settings:
If you are using Conda, create the environment with the following commands:
conda env create -f environment.yml
conda activate sparqlgen
If you are not using Conda, you can create the environment with this command:
pip install -r requirements.txt
We have provided the sentence embeddings and other relevant information, but users can create their own embeddings using the notebook in the temp folder. To obtain the context examples (embeddings and other information), download the necessary files to the temp
directory using the following command:
wget https://anon.to/?https://files.dice-research.org/datasets/COT-SPARQLGEN/dbpedia_examples.parquet https://anon.to/?https://files.dice-research.org/datasets/COT-SPARQLGEN/embeddings_dbpedia.pkl https://anon.to/?https://files.dice-research.org/datasets/COT-SPARQLGEN/embeddings_wikidata.pkl https://anon.to/?https://files.dice-research.org/datasets/COT-SPARQLGEN/wikidata_examples.parquet
You are now ready to run the system with the following command:
python main.py --model_path --kb --question
For example:
python main.py --model_path TheBloke/CodeLlama-34B-Instruct-GPTQ --kb dbpedia --question 'what is the capital of Germany'
You can select the knowledge base using dbpedia or wikidata. Also, you can change the model based on the available system.
We used the following datasets during our experiments:
Wikidata | Dbpedia |
---|---|
QALD-10 | QALD-9 |
LcQuad2.0 | Vquanda |
The datasets are available for download in the dataset
folder of our repository.
We have provided a link to the embeddings and relevant data. However, if you wish to create your own embeddings, the code is available in the temp
folder, named embeddings.ipynb
. We utilized all-MiniLM-L6-v2
for sentence encoding, but users may change it according to their requirements.
contexta.py
provides all the details about the models and tools used for entity linking for DBpedia and Wikidata, respectively. This is also optional, and users may use the tools of their preference.