Currently, most ontologies are created manually, which is time-consuming and labour-intensive. Meanwhile, the advanced capabilities of Large Language Models (LLMs) have proven beneficial in various domains, significantly improving the efficiency of text processing and text generation. Therefore, this paper focuses on the use of LLMs for ontology learning. It uses a manual ontology construction method as a basis to facilitate the LLMs for ontology learning. The proposed approach is based on Retrieval Augmented Generation (RAG), and passed queries to LLMs are based upon the manual ontology method -- UPON Lite ontology. Two different variants of LLMs have been experimented with, and they all demonstrate the capability of ontology learning to varying degrees. This approach shows promising initial results in the direction of (semi-) automated ontology learning using LLMs and makes the ontology construction process easier for people without prior domain expertise. The results were evaluated by the domain expert, and LLMs results were ranked based on the defined criteria.
Below are flowcharts that visualize the logic behind the RAG and query pipeline:
RAG:
The file "Code" contains all the information for the reuse.
The Python_code.py
contains all the core code for creating ontology using Python.
The input document is named Document_Input
.
The file Parameters.yaml
encompass all adjustable parameters in the code.
- Python 3.8 or higher
- Visual Studio Code
Before running the code, ensure you have the following files:
Parameters.yaml
Document_Input
Python_cpde.py
You do not need to execute Parameters.yaml
and Document_Input
unless you want to change the parameters and input, they are required for the Python_code.py
.
All default parameters are stored in Parameters.yaml
. If you need to adjust queries or inputs, modify them in this file.
To run the code:
- Open
Python_code.py
in Visual Studio Code. - Install any missing packages if necessary.
- Replace the default file locations for
Parameters.yaml
andDocument_Input
with the paths to your files. - Input your API key for OpenAI, Hugging Face, or another compatible platform for your LLM.
- Define your embedding models and LLM in the code.
- Run the code either in its entirety or step by step.
If you encounter a 500 server error, it may indicate that the LLM is unable to process your query. In this case, try the following:
- Retry the request.
- Modify the query to ensure it is within the LLM's processing capabilities.
For further assistance, please open an issue in this repository.
LangChain. 2024. Question Answering Quickstart. https://python.langchain.com/v0.1/docs/use_cases/question_answering/quickstart/