This repository consists of a set of semantic association rule learning approaches from sensor data and knowledge graphs in IoT environments, which are listed below:
- Aerial: Our AE-based ARM approach
- Naive SEMantic Association Rule Learning (Naive SemRL) [2], implemented using MLxtend [8].
- TS-NARM [3] (does not support semantic association rules, however, the original version is adapted to work with semantics), implemented using NiaARM [4] and NiaPy [5].
The sensor data is stored inside a TimescaleDB instance, while the knowledge graph is stored inside a Neo4j database instance. The folder named semantic_rule_learning contains the source code for our proposed AE-based ARM approach, as well as all other baseline methods
- data: contains semantic part of 3 datasets namely LeakDB [1] and L-Town [6] from water networks domain and LBNL [7] from energy domain. It also contains sensor data for LeakDB dataset. The sensor data for the other 2 datasets are not included in this folder as they occupy too much space. However, they can be found following the references.
- graphdb: contains a Neo4j implementation together with 3 Python scripts that are used to create a knowledge graph per dataset.
- timescaledb: a TimescaleDB implementation together with Python scripts that stores sensor data from each dataset in a different table
- semantic_rule_learning: a python application that creates semantic association rules from a given sensor data and IoT knowledge graph
- LeakDB [1]: is an artificially generated realistic dataset for leak detection in water distribution networks. It contains sensor data from 96 sensors of various types, as well as semantic information such as the formation of the network, sensor placement, and properties of network components, i.e., pipes and junctions
- L-Town [6]: is another dataset in the water distribution networks with same characteristics. It contains 118 sensors and associated semantics.
- LBNL [7]: is a Fault Detection and Diagnostics Dataset that contains both sensor data from 29 sensors and semantics for Heating, Ventilation, and Air Conditioning (HVAC) systems such as the placement of the sensors and structure of the HVAC system.
- Create a Neo4j graph database instance. Run the start_graphdb.sh script inside
graphdb
module to create an instance of Neo4j graph database. You can set a default username and password by changing theNEO4J_AUTH
variable, for instance,NEO4J_AUTH=neo4j/password
. - Create knowledge graphs. The main.py file in the
graphdb
module is used to import semantic properties (metadata) of the two water distribution networks dataset into the graphdb, as a knowledge graph. Navigate tographdb
module on your terminal and install requirements using the following commandpip3 install -r requirements.txt
. Update thefile_name
variable inside this Python script to point out to a specific metadata file, e.g.,LeakDB_Hanoi_CMH_Scenario-1.inp
. Then rename the .env_example to.env
, set environment variables for the graphdb, including its IP address, username and password. Then run themain.py
script, e.g.,python3 main.py
. You can go tohttp://localhost:7474/browser/
on your browser and run the following query to check if the knowledge graph is created properlymatch (s) return s
: - Import timeseries data. Run start_timescaledb.sh under the
timescaledb
module to create a TimescaleDB instance. Navigate totimescaledb
module on your terminal and install requirements using the following commandpip3 install -r requirements.txt
. There are 3 Python scripts under the same module to import sensor data for each of the datasets. Change database IP address and credentials in each file, e.g., import_leakdb_sensor_data.py. Run, for instance, import_leakdb_sensor_data.py to import sensor data into timescaledb, within a table namedleakdb
. - ** Learn semantic association rules.** Navigate to
semantic_rule_learning
module on your terminal and install requirements using the following commandpip3 install -r requirements.txt
. Change the name of .env_example undersemantic_rule_learning
module to.env
. Set environment variables for the graph and time series database in.env
, such as IP addresses, usernames and passwords. To configure parameters for each of the algorithms, go to main.py, and see the parameters at the beginning of the file. After setting the configurable algorithmic parameters, run themain.py
script, e.g.,python3 main.py
. Example output:
The source code of our AE-based ARM is given in
the aerial.py file. The
file contains a method called create_input_vectors()
which receives a knowledge graph in the form
of NetworkX graphs, and a transaction set in the form of array of arrays where each array
contains items for a transaction. generate_rules()
method is called to generate a set of semantic association rules.
You can copy this file to your own codebase to use it as described.
A denoising Autoencoder implementation is given
in autoencoder file.
It is a generic denoising Autoencoder implementation that has 3 layers for encoder and decoder each. The dimension of
each layer is 4 times smaller than the previous layer, hence an under-complete Autoencoder. You can copy this file to
your code base to use it as part of our AE-based ARM or create your own Autoencoder implementation, and call it
inside train()
method
of aerial.py.
- Vrachimis, Stelios G., and Marios S. Kyriakou. "LeakDB: a Benchmark Dataset for Leakage Diagnosis in Water Distribution Networks:(146)." WDSA/CCWI Joint Conference Proceedings. Vol. 1. 2018.
- Erkan Karabulut, Victoria Degeler, and Paul Groth. 2023. Semantic Association Rule Learning from Time Series Data and Knowledge Graphs. In Proceedings of the 2nd International Workshop on Semantic Industrial Information Modelling (SemIIM 2023) co-located with 22nd International Semantic Web Conference (ISWC 2023). 1–7.
- Fister Jr, Iztok, et al. "Time series numerical association rule mining variants in smart agriculture." Journal of Ambient Intelligence and Humanized Computing 14.12 (2023): 16853-16866.
- Stupan, Žiga, and Iztok Fister. "Niaarm: a minimalistic framework for numerical association rule mining." Journal of Open Source Software 7.77 (2022): 4448.
- Vrbančič, Grega, et al. "NiaPy: Python microframework for building nature-inspired algorithms." Journal of Open Source Software 3.23 (2018): 613.
- Vrachimis, S., Eliades, D., Taormina, R., Ostfeld, A., Kapelan, Z., Liu, S., Kyriakou, M., Pavlou, P., Qiu, M., Polycarpou, M.: Dataset of battledim: Battle of the leakage detection and isolation methods. In: Proc., 2nd Int CCWI/WDSA Joint Conf. Kingston, ON, Canada: Queen’s Univ (2020).
- Granderson, J., Lin, G., Chen, Y., Casillas, A., Im, P., Jung, S., Benne, K., Ling, J., Gorthala, R., Wen, J., Chen, Z., Huang, S., , Vrabie, D.: Lbnl fault detection and diagnostics datasets (08 2022). https://doi.org/10.25984/1881324, https://data.openei.org/submissions/5763
- Sebastian Raschka. 2018. MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. The Journal of Open Source Software 3, 24 (April 2018). https://doi.org/10.21105/joss.00638
Please send an email to the following address for your feedback and questions: [email protected].