Skip to content

Latest commit

 

History

History
170 lines (123 loc) · 7.27 KB

README.md

File metadata and controls

170 lines (123 loc) · 7.27 KB

RevOnt: Reverse engineering of an ontology via competency question extraction from knowledge graphs

Extracting competency questions from the Wikidata knowledge graph

contributors last update forks stars open issues


📔 Table of Contents

  • About the Project

  • Scripts

  • Usage

  • Roadmap

  • Contributing

    🌟 About the Project

    The process of developing ontologies - a formal, explicit specification of a shared conceptualisation - is addressed by well-known methodologies. As for any engineering development, its fundamental basis is the collection of requirements, which includes the elicitation of competency questions. Competency questions are defined through interacting with domain and application experts or by investigating existing datasets that may be used to populate the ontology i.e. its knowledge graph. The rise in popularity and accessibility of knowledge graphs provides an opportunity to support this phase with automatic tools. In this work, we explore the possibility of extracting competency questions from a knowledge graph. We describe in detail RevOnt, an approach that extracts and abstracts triples from a knowledge graph, generates questions based on triple verbalisations, and filters the questions to guarantee that they are competency questions. This approach is implemented utilizing the Wikidata knowledge graph as a use case. The implementation results in a set of core competency questions from 20 domains present in the dataset presenting the knowledge graph, and their respective templates mapped to SPARQL query templates. We evaluate the resulting competency questions by calculating the BLEU score using human-annotated references. The results for the abstraction and question generation components of the approach show good to high quality. Meanwhile, the accuracy of the filtration component is above 86%, which is comparable to the state-of-the-art classifications.

    REVONT (3)

    An overview of the RevOnt framework. The first stage, Verbalisation Abstraction, generates the abstraction of a triple verbalisation. The abstraction is used as input in the second stage, Question Generation, to generate three questions per triple and perform a grammar check. Lastly, the third stage, Question Filtration, filters the questions by performing different techniques.

    🧰 Scripts

    ‼️ Prerequisites

    This project needs to have installed several packages for the usage of the language models and the Wikidata querying service.

    pip install -U sentence-transformers
    pip install happytransformer
    pip3 install qwikidata
    

    ⚙️ Import

    The functions import many packages and need to the downloading of wordnet and OMW-1.4 as shown below.

    from qwikidata.sparql import return_sparql_query_results
    from IPython.core.debugger import skip_doctest
    from sentence_transformers import SentenceTransformer
    from transformers import AutoTokenizer, AutoModel, AutoModelForTokenClassification, pipeline
    from happytransformer import HappyTextToText, TTSettings
    from sklearn.metrics.pairwise import cosine_similarity
    import re
    import json
    import time
    import torch
    import torch.nn.functional as F
    import nltk
    from nltk.corpus import wordnet
    nltk.download('wordnet')
    nltk.download('omw-1.4')
    

    👀 Usage

    In the repository, there are separate scripts for each of the components. This separation provides the possibility to opt out a using a component or interchanging the queue in which the components are executed. The scripts also allow to use a different language model that the default one. The language models used in the scripts are state-of-the-art models that have shown good to high results in the first evaluation of the method.

    🧭 Roadmap

    • First implementation of the RevOnt method using data from the Wikidata knowledge graph
    • Second implementation of the RevOnt method using data from a AMR graph built from a textual corpus.

    👋 Contributing

    Contributions are always welcome!

    🤝 Contact

    Fiorela Ciroku - @ciroku_fiorela - [email protected]

    Project Link: (https://github.com/FiorelaCiroku/RevOnt)[https://github.com/FiorelaCiroku/RevOnt]

    💎 Acknowledgements