Use cases

Installation & Dependencies

This script requires Python 3.7, which is a strict requirement for stellargraph (used in the script). To install all dependencies pip install -r requirements.txt

Using the algorithm

usage: rules_generator.py [-h] [--endpoint ENDPOINT] [--input INPUT] [--graph GRAPH] [--lang LANG] [--filename FILENAME] [--conf CONF] [--int INT] [--occurrence OCCURRENCE]

optional arguments:

-h, --help show this help message and exit

--endpoint ENDPOINT The endpoint from where retrieve the data (identified through codes: issa, covid). You can define new queries for different SPARQL endpoints in queries.json.

--input INPUT If available, path to the file containing the input data

--graph GRAPH In case there is a graph where to get the data from in the endpoint, provide (valid for issa: agrovoc, geonames, wikidata, dbpedia)

--lang LANG The language of the labels

--filename FILENAME The output file name. If not provided, it will be automatically generated based on the input information.

--conf CONF Minimum confidence of rules. Default is .7, rules with less than x confidence are filtered out.

--int INT Minimum interestingness (serendipity, rarity) of rules. Default is .3, rules with less than x interestingess are filtered out.

--occurrence OCCURRENCE Keep only terms co-occurring more than x times. Default is 5

After the first execution, the algorithm saves the input data into a csv file. To run the algorithm again using this file as input data instead of querying the endpoint, give the path as --filename.

The arguments --endpoint, --graph, and --lang are used to retrieve and custom the query from the queries.json file to retrieve the input data. To include a new query or a new SPARQL endpoint, modify the queries.json file accordingly.

Use cases

Mining association rules from the agrovoc graph of the ISSA dataset:

./rules_generator.py --endpoint 'issa' --graph 'agrovoc' --lang 'en'

Mining association rules from the CovidOnTheWeb dataset:

./rules_generator.py --endpoint 'covid' --lang 'en'

License

See the LICENSE file.

Cite this work

When using this algorithm in a publication, please cite this paper:

Lucie Cadorel, Andrea G. B. Tettamanzi. Mining RDF Data of COVID-19 Scientific Literature for Interesting Association Rules. WI-IAT'20 - IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, Dec 2020, Melbourne, Australia. hal-03084029

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.devcontainer		.devcontainer
.vscode		.vscode
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
datasets.py		datasets.py
docker-compose.debug.yml		docker-compose.debug.yml
docker-compose.yml		docker-compose.yml
main.py		main.py
queries.json		queries.json
readme.md		readme.md
requirements.txt		requirements.txt
rules_generator.py		rules_generator.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation & Dependencies

Using the algorithm

Use cases

License

Cite this work

About

Releases

Packages

Languages

License

issa-project/association-rules-mining

Folders and files

Latest commit

History

Repository files navigation

Installation & Dependencies

Using the algorithm

Use cases

License

Cite this work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages