The data lifting module for ClaimsKG that creates the RDF/LOD dataset instantiation from the ClaimsKG model (illustrated hereafter).
This program requires Python 3.x to run.
To install the dependencies you may use: pip3 install -r requirements.txt
- For usage information you may use
python3 export.py -h
-
The options are the following:
--input [file]
Indicated the location of the zip file generated by the fake new extractor (mandatory)--output [file]
Specifies the output file for the model (default: out.ttl)--format [format]
Specifies the format of the output serialization. You may use any of the supported formats in therdflib
package (xml', 'n3', 'turtle', 'nt', 'pretty-xml', 'trix', 'trig' and 'nquads'; default: turtle)--model-uri
The base URI of the model (by defaulthttp://data.gesis.org/claimskg/public/
)--resolve
Specifies whether to resolve the Wikipedia page identifiers for TagMe annotations to DBPedia URIs. If this option is activated, the resolution is performed through SPARQL queries to the official DBPedia endpoint, which requires you to have an active Internet connection. Additionally, you will need a running instance ofredis-server
as the results of the queries are cached to prevent unnecessary queries from being performed. If resolve is not supplied, all entities will have URIs of the formtagme://wikpediaPageID
.--threshold [float_value]
If--resolve
is present, specifies the cutoff confidence threshold to include a TagMe annotations as a mention. The TagMe documentation recommends a value between 0.1 and 0.3 (default 0.3)--include-body
If--include-body
is supplied, the body of the claim review is included in theschema:ClaimReview
instances through theschema:reviewBody
property.
In the comtext of the claim matching approach, we have produced a annotated dataset for evaluation purposes. The dataset contains 318 matching claims categorized in several types of matches:
- E: Exact match
- E*: Same claim but different claimee
- ST: Same topic, meaning that two claims are about the same occurence or event
The first columnt contains the type of match, the second column, the URI of the first claim, the third column, the URI of the second claim.
The gold file can be downloaded here: https://drive.google.com/open?id=1evf67t_p0LqF5ZvNiL-geDCrEBlppZVF
The URIs correspond to that of the following dataset: http://andon.tchechmedjiev.eu/files/claimskg_20_12_2018.ttl.gz
Latest releases of core data from en.wikipedia.org (ttl file): https://databus.dbpedia.org/dbpedia/collections/latest-core https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2021.03.01/categories_lang=en_skos.ttl.bz2