This library provides advanced support for working with RDF in Prolog.
-
Install SWI-Prolog.
-
Install the Prolog Library Collection.
-
Clone this repo, and add the following line to your
$HOME/.config/swi-prolog/init.pl
file:user:file_search_path(library, '/your/path/to/prolog_rdf/prolog').
You can now load the libraries from this repo in the following way:
:- [library(semweb/rdf_term)].
This library uses the following extended Prolog types in the documentation headers of predicates:
Type | Definition |
---|---|
rdf_bnode |
An atom that starts with _: . |
rdf_graph |
Either a term of type iri or the atom default . |
rdf_iri |
An atom that can be decomposed with uri_components/2 from library(uri) . |
rdf_literal |
A compound term of the form literal(type(iri,atom)) or literal(lang(atom,atom)) . |
rdf_name |
An RDF name (IRI or literal). |
rdf_quad |
A compound term of the form rdf(rdf_nonliteral,iri,rdf_term,rdf_graph) . |
rdf_term |
An RDF term (blank node, IRI, or literal). |
rdf_triple |
A compound term of the form rdf(rdf_nonliteral,iri,rdf_term) . |
rdf_tuple |
A term of type rdf_quad or rdf_triple . |
This section enumerates the various modules that are included in this library.
This module contains data cleaning predicates that were previously part of LOD Laundromat. They can be used to clean RDF tuples that are streamed from an RDF source. See module [[semweb/rdf_deref]] for creating streams over RDF sources.
In order to use this module, library
prolog_uriparser
must be installed.
The parsers in the Semantic Web standard library emit blank node labels that contain characters that are not allowed in standards-compliant output formats (e.g., forward slashes). This is unfortunate, since writing the data into standard-compliant formats requires maintaining a state that ensures that Prolog internal blank node labels are consistently emitted by the same standard-compliant external blank node label. See this Github issue for context.
Besides the above considerations, blank nodes form a scalability issue in general. Since blank node labels are only guaranteed to be unique within the context of an RDF document, combining data from multiple documents requires a check of all blank node labels in the to be combined documents. Furthermore, all blank node labels that appear in more than one RDF document must be consistently renamed prior to combining the data.
Since Pro-RDF focusses on scalability, it cannot rely on maintaining
an internal state that consistently maps internal Prolog blank node
labels to external standards-compliant blank node labels. For the
same reasons, it also cannot rely on full document inspection and
blank node relabeling approaches. For these reasons, the data
cleaning prediates in library(rdf_clean)
replace blank nodes with
well-known IRIs, in line with the RDF 1.1 standard. This means that
every data cleaning predicate must bind a valid well-known IRI to the
BNodePrefix
argument. It also means that Prolog internal blank node
labels are hashed using the MD5 algorithm to provide the local names
for the generated well-known IRIs. The latter ensures consistent
relabeling without maintaining an internal state.
The parsers from the Semantic Web standard library denote the default
graph with atom user
. This is translated to atom default
. For
named graphs, this library checks whether they are well-formed IRIs.
IRI cleaning is the most difficult part of syntactic RDF data cleaning. To date, the IRI grammar (RFC 3987) has not yet been implemented. Since this grammar was published over a decade ago, we must anticipate a future in which the main syntactic component of the Semantic Web cannot be validated.
While there are implementations of the URI grammar (RFC
3986), the one provided by the
SWI-Prolog standard library (library(uri)
) is incorrect.
Because of the above two reasons we currently only check the following:
-
Whether an IRI can be decomposed into scheme, authority, path, query, and fragment components using the Prolog standard library grammar (
uri_components/2
). -
Whether the scheme, authority, and path components are non-empty.
-
Whether the scheme components conforms to the IRI grammar.
For language-tagged strings, cleaning involves downcasing the language tag. While there are implementations of the language tag grammar (RFC 5646), we are not yet using these.
Simple literals, i.e., literals with neither language tag not datatype
IRI, are translated to typed literals with datatype IRI xsd:string
.
For typed literals, cleaning involves:
-
Cleaning the datatype IRI (see [[IRI cleaning]]).
-
Making sure the datatype IRI is not
rdf:langString
. -
Cleaning the lexical form according to the datatype IRI. Lexical form cleaning is the most involved step, since there are many different datatype IRIs. Since it is impractical to implement lexical form cleaning for all datatype IRIs, we focus on those that are most widely used. For this we use
rdf_literal_value/3
, which is part of librarylibrary(semweb/rdf_term)
.
This module provides the following predicates.
Cleans quadruple compound terms.
Cleans triple compound terms.
Cleans quadruple and/or triple compound terms.
This module implements RDF dereferencing, i.e., the act of obtaining interpreted RDF statements based on a given RDF document, stream, or HTTP(S) URI.
This library provides the following predicates.
Calls RDF dereferencing on local RDF documents. Uses heuristics in order to determine the RDF serialization of the file.
Performs RDF dereferencing on an input stream containing one of the standardized RDF serialization formats.
Performs RDF dereferencing on a URI, typically an HTTP(S) URI. Uses heuristics in order to determine the RDF serialization of the reply body.
This library provides primitives for generating GraphViz DOT exports of RDF terms and tuples.
This module requires library
prolog_graphviz
to
be installed.
This module writes RDF data in a simple and standards-compliant serialization format. It contains the following predicates:
rdf_write_iri/2
rdf_write_literal/2
rdf_write_name/2
rdf_write_quad/[2,3,5]
rdf_write_triple/[2,4]
rdf_write_tuple/2
This module peeks at the beginning of a file, stream, or string in order to heuristically guesstimate the RDF serialization formats (if any) containing in that input:
rdf_guess_file/3
rdf_guess_stream/3
rdf_guess_string/2
This module provides support for the standardized RDF serialization format Media Types:
Guesses the RDF serialization format based on the file name extension alone.
Enumerates all standardized RDF Media Types.
Succeeds if the former argument is an RDF Media Type that syntactically encompasses the latter argument (e.g., TriG > Turtle > N-Triples, N-Quads > N-Triples).
Gives a standard file name extension for RDF serializations that are not RDFa (which is part of HTML or XHTML content).
Succeeds for RDFa Media Types.
This module provides extended support for working with RDF prefix declarations:
Enumerates the currently declared RDF prefix declarations.
Succeeds for (alias,local-name) pairs and full IRIs.
Provide the corresponding popular Prolog predicates, but apply RDF prefix notation expansion on their arguments.
RDF prefix expansion must be specifically declared for arguments in
predicates. In the SWI-Prolog standard libraries, such declarations
have only been added for predicates in the Semantic Web libraries, but
not for predicates in other standard libraries. For example, the
following will not check whether P
is bound to either of the four
RDFS properties, because the prefix notation is not expanded:
memberchk(P, [rdfs:domain,rdfs:range,rdfs:subClassOf,rdfs:subPropertyOf]),
With the SWI-Prolog standard library, the above call must be spelled
out using rdf_equal/2
in the following way:
( rdf_equal(P, rdfs:domain)
-> true
; rdf_equal(P, rdfs:range)
-> true
; rdf_equal(P, rdfs:subClassOf)
-> true
; rdf_equal(P, rdfs:subPropertyOf)
-> true
)
When library(rdf_prefix)
is loaded, the above can be written as
follows:
rdf_prefix_memberchk(P, [rdfs:domain,rdfs:range,rdfs:subClassOf,rdfs:subPropertyOf]),
This module provides DCG rules for printing RDF terms and tuples.
This module provides advanced support for composing, decomposing, parsing, and generating RDF terms.