Skip to content

JSKOS Concept Occurrences Provider implementation

License

Notifications You must be signed in to change notification settings

gbv/subjects-api

Repository files navigation

Subjects API

Test GitHub package version standard-readme compliant

API to provide information about subject indexing in the K10plus catalog

This API can be used to query how a concept or combination of concepts is used in records of a database. This basically includes: which concepts a record is index with (subjects), which records have been indexed with a concept (records), the number of records indexed with a concept and/or a deep link into a catalog to get these records (occurrences, links), and which concepts are used together with other concepts (co-occurrences).

Table of Contents

Install

Requires Node.js v18 or newer.

git clone https://github.com/gbv/subjects-api.git
cd subjects-api
npm ci

You can also run Subjects API via Docker.

Configuration

Optionally create a configuration file .env to change certain config options. Here are the default values:

PORT=3141
BACKEND=SQLite
DATABASE=./subjects.db
SCHEMES=./vocabularies.json
LINKS=./links.json

All vocabularies included in K10Plus Subjects are preconfigured via vocabularies.json.

Then full the backend database (SQLite by default) with subject indexing data from K10plus catalog. The script ./bin/import.js can be used to do so (not documented yet).

Backends

SQLite

Requires to start the application once to create SQLite database file under subjects.db.

K10Plus

Retrieves bibliographic records in K10plus Format via unAPI. Only supports subjects endpoint.

BACKEND=K10Plus
DATABASE=https://unapi.k10plus.de/

SPARQL

Requires a SPARQL-Endpoint, including SPARQL Update and SPARQL Graph Store Protocol for write access. Only tested with Apache Jena Fuseki.

BACKEND=SPARQL
DATABASE=http://localhost:3030/k10plus
GRAPH=https://uri.gbv.de/graph/kxp-subjects     # optional

The RDF data model consists of two properties dct:subject and skos:inScheme:

?record  <http://purl.org/dc/terms/subject> ?subject .
?subject <http://www.w3.org/2004/02/skos/core#inScheme> ?voc .

Neo4j (experimental)

Default configuration:

BACKEND=Neo4j
DATABASE=neo4j://localhost
DB_NAME=
DB_USER=
DB_PASSWORD=

Usage

npm run start

Some backends allow to import data from a headerless TSV file with three columns for PPN, vocabulary id (VOC), and notation. Regular dumps of K10plus are available from https://doi.org/10.5281/zenodo.7016625.

npm run import -- subjects.tsv

Option --full replaces the existing backend data, otherwise the data is added to existing subjects data. Option --modified can be used to set the modification date (timestamp of file by default).

Import into SQLite backend is also possible directly, but not recommended:

URL=$(curl -sL "https://zenodo.org/api/records/7016625" | jq -r '.files[]|select(.key|endswith(".tsv.gz"))|.links.self')
curl -sL $URL | zcat | sqlite3 subjects.db -cmd ".mode tabs" ".import /dev/stdin subjects"

If vocabularies.json is updated or replaced, it is necessary to add concept APIs to it via npm run add-vocabulary-apis (and commit the changes if necessary).

API

GET /subjects

Returns a (possibly empty) array of JSKOS Concepts a record is indexed with. The special value null can be included as last array element to indicate that more subjects may exist.

Query parameters:

  • record - URIs of records, separated by |
  • scheme - URIs of concept schemes, separated by |. The default value * can be used to include all concept schemes.

This endpoint returns the same information as /occurrences endpoint with query parameter record and scheme (parameter member not set) but with different output format (JSKOS Concepts instead of Concept Occurrences).

GET /records

Returns an array of records with given subject.

Return format is experimental

Query parameters:

  • subjects - URI of a concept from supported vocabularies
  • limit - maximum number of records to return (10 by default)
  • format - return format (not supported yet)

GET /occurrences

Returns a (possibly empty) array of JSKOS Concept Occurrences. Depending on query parameters the result consists of:

  • the occurrence of a concept specified via member
  • the occurrence of concepts in a record specified via record
  • the co-occurrences of a concept specified via member in all records, when query parameter scheme is given

Occurrences contain deep links into K10plus catalog for selected vocabularies.

Query parameters:

  • member - URI of a concept from supported vocabularies
  • record - URI of a record
  • scheme - URI of a target concept scheme (when given, co-occurrences are returned; when value * is given, all supported target schemes are used)
  • threshold - a minimum threshold for co-occurrences to be included

There is a deprecated alias at /api to be removed soon.

GET /occurrences/voc

Alias for GET /voc to support clients that only know about Occurrences API by its base URL /occurrences.

GET /links

Not implemented yet, see #44.

Return a list if deep links into database to list all records indexed with a given concept.

Query parameters:

  • subject - URIs of a concepts

Return format:

JSON Array of objects, each with:

  • url
  • label (name of the database)
  • description (optional)

This endpoint returns the same information as /occurrences endpoint with query parameter subject instead of member but a different return format and no number of records.

GET /voc

Returns an array of supported vocabularies as JSKOS Concept Schemes.

There is a deprecated alias at /api/voc to be removed soon and a stable alias at /occurrences/voc.

GET /databases

Returns an array of supported databases. Return format is experimental.

GET /status

Returns information about the service. Return format is experimental.

Maintainers

Contributing

PRs accepted against the dev branch.

Small note: If editing the README, please conform to the standard-readme specification.

License

MIT © 2022 Verbundzentrale des GBV (VZG)