This showcase demonstrates the use of SPARQL Anything for constructing a Knowledge Graph from data encoded in HTML pages.
In what follows, fx
refers to the following command line
java -jar sparql-anything-<version>-.jar
This query extracts the list of artists from the Web page and build an XML result set with ?artistNickname and ?artistUrl. The SPARQL result set file will be used in the next query to iterate over each one of the artists' pages.
Title | Step 1: list artists from the catalogue |
---|---|
Query | queries/imma-artists.sparql |
Input | https://imma.ie/artists/ |
Output | imma-artists.xml |
Type | SELECT |
Options | html.selector=#az-group |
Formats | HTML |
Level | Novice |
Run the example as follows:
fx -q queries/imma-artists.sparql -o imma-artists.xml -f xml
In this step we use a parametrized query that is able to query an artists' web page and extract relevant metadata. The query is repeated for each value of the SPARQL result set file produced in the previous step. The command generates a JSON-LD for each execution, using the artist nickname as file name (one of the values provided by the result set). Crucially, the JSON-LD files produced will include web pages of the related artworks.
Title | Step 2: iterate over artists' web pages and create a JSON-LD for each one of them |
---|---|
Query | queries/imma-artist.sparql |
Input | imma-artists.xml, ?_artistUrl |
Output | artists/*.jsonld |
Type | CONSTRUCT |
Options | |
Formats | HTML |
Level | Novice |
Run the example as follows:
fx -q queries/imma-artist.sparql -i imma-artists.xml -p "artists/?artistNickname.jsonld" -f json
Next, we extract the list of artworks' Web pages from the JSON-LD files of the artists. This is easy as we can simply query the JSON-LD files, loading them in an in-memory dataset via the command-line option -l
.
Title | Step 3: Generate the list of artworks |
---|---|
Query | queries/imma-artworks.sparql |
Input | artists/ |
Output | imma-artworks.xml |
Type | SELECT |
Options | -l |
Formats | |
Level | Novice |
Run the example as follows:
fx -q queries/imma-artworks.sparql -l artists/ -o imma-artworks.xml -f xml
Next, we extract data from the artworks' Web pages and build one JSON-LD file each (create folder 'artworks' first).
Title | Step 4: Generate the list of artworks |
---|---|
Query | queries/imma-artwork.sparql |
Input | imma-artworks.xml, ?_artworkUrl |
Output | artworks/*.jsonld |
Type | CONSTRUCT |
Options | |
Formats | |
Level | Novice |
fx -q queries/imma-artwork.sparql -i imma-artworks.xml -p "artworks/?artworkNickname.jsonld" -f json
Finally, we can load the files into our favourite triple store.
These queries can be used to execute only one specific artists/artwork. In addition, they showcase the CLI option -v
, used to pass parameter values.
Extract data from a specific artist Web page:
fx -q queries/imma-artist.sparql -v artistNickname=lambert-gene -v artistUrl=https://imma.ie/artists/gene-lambert/ -p "artists/?artistNickname.jsonld" -f json
Extract data from a specific artwork Web page:
fx -q queries/imma-artwork.sparql -v artworkNickname=naturaleza-desde-la-ventana -v artworkUrl=https://imma.ie/collection/naturaleza-desde-la-ventana/ -p "artworks/?artworkNickname.jsonld" -f json
fx -q queries/imma-artwork.sparql -v artworkNickname=berry-dress -v artworkUrl=https://imma.ie/collection/berry-dress/ -p "artworks/?artworkNickname.jsonld" -f json