LINDA

Rule Mining on Distributed Data: A distributed rule mining system based on Spark which mines non monotonic rules.

Over the recent years, the advances in information extraction has resulted in the proliferation of large Knowledge Bases (KBs), which is a machine readable collection of knowledge. The KBs are inevitably bound to be incomplete. Logic rules are being used to predict new facts based on the existing ones. Horn rules are particular rule-like form which gives it useful properties for use in logic programming and are one of the popular form of rules mined. However, Horn rules do not take into account possible exceptions, so that predicting facts via such rules introduces errors. Another major challenge is to perform scalable analysis of large scale KBs to facilitate rule mining. Hence we address these two problems in this work and present two rule mining approaches implemented for a cluster based environment - (1) Exception Enriched Rule Mining Approach, which aims at adding exception (negated body elements) in order to refine rules; (2) Decision Tree Rule Mining, a rule miner which can mine both horn rules and non-monotonic rules. Both these approaches are implemented for a distributed environment. We apply our method to discover rules with exceptions from real-world KBs.

Getting Started

Prerequisites

Spark - The instructions to install Spark can be found here.
Maven - The instructions to install Maven can be found here.
Java
HDFS - The instructions to install HDFS can be found [here] (https://www.linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/)

Installing

Clone the project.
Build the project using

mvn clean package

Start HDFS and spark in the cluster
Submit jobs in spark by going to the spark directory and executing the following

EWS Rule Miner

./bin/spark-submit --class org.aksw.dice.linda.Utils.DatasetParser --master <Master url> <location to jar >/LINDA-0.1.0.jar <dataset name> "<hdfs cluster url>" "<Input dataset exact path>”

./bin/spark-submit --class org.aksw.dice.linda.EWS.EWSRuleMiner --master <Master url> <location to jar >/LINDA-0.1.0.jar <dataset name> "<hdfs cluster url>"

./bin/spark-submit --class org.aksw.dice.linda.EWS.FactGenerator --master <Master url> <location to jar >/LINDA-0.1.0.jar <dataset name> "<hdfs cluster url>"

Decision Tree Rule Miner

Comment out the Horn Rule Miner from Dataset Parser since it is not required.

./bin/spark-submit --class org.aksw.dice.linda.Utils.DatasetParser --master <Master url> <location to jar >/LINDA-0.1.0.jar <dataset name> "<hdfs cluster url>" "<Input dataset exact path>”

Execute the LIBSVM Writer

python libsvmwriter.py

Put the new data files created in the location HDFSClusterURL\DatasetNAME\LIBSVMDATA.

./bin/spark-submit --class org.aksw.dice.linda.classificationRuleMining.DTRuleMiner --master <Master url> <location to jar >/LINDA-0.1.0.jar <dataset name> "<hdfs cluster url>"

./bin/spark-submit --class org.aksw.dice.linda.classificationRuleMining.DTFactGenerator --master <Master url> <location to jar >/LINDA-0.1.0.jar <dataset name> "<hdfs cluster url>"

Built With

Maven - Dependency Management
Spark - Cluster Computing Framework

Author

Kunal Jha

License

Acknowledgments

Tommaso Soru
Gezim Sejdiu
Ivan Ermilov
Dr. Hajira Jabeen
Prof. Dr. Axel-C. Ngonga Ngomo
Prof. Dr. Jens Lehmann

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
src/main		src/main
.gitignore		.gitignore
README.md		README.md
libsvmdatasetCreator.py		libsvmdatasetCreator.py
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LINDA

Getting Started

Prerequisites

Installing

Built With

Author

License

Acknowledgments

About

Releases

Packages

Languages

dice-group/LINDA

Folders and files

Latest commit

History

Repository files navigation

LINDA

Getting Started

Prerequisites

Installing

Built With

Author

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages