Skip to content

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.

License

Notifications You must be signed in to change notification settings

hannahbast/qlever

 
 

Repository files navigation

QLever

Build Status via Docker Build Status via G++10/Clang++11

QLever (pronounced "Clever") is a SPARQL engine that can efficiently index and query very large knowledge graphs with tens of billions of triples on a single standard PC or server. In particular, QLever is fast for queries that involve large intermediate or final results, which are notoriously hard for engines like Blazegraph or Virtuoso. QLever also supports search in text associated with the knowledge base, as well as SPARQL autocompletion. Here are demos of QLever on a variety of large knowledge graphs, including the complete Wikidata and OpenStreetMap. Those demos also feature QLever's context-sensitiv autocompletion, which makes SPARQL query construction so much easier.

QLever was frist described and evaluated in this CIKM'17 paper. Qlever's autocompletion functionality is described and evaluated in this paper. If you use QLever in your work, please cite those papers. QLever supports standard SPARQL 1.1 constructs like: LIMIT, OFFSET, ORDER BY, GROUP BY, HAVING, COUNT, DISTINCT, SAMPLE, GROUP_CONCAT, FILTER, REGEX, LANG, OPTIONAL, UNION, MINUS, VALUES, BIND. Predicate paths and subqueries are also supported. The SERVICE keyword is not yet supported. We aim at full SPARQL 1.1 support and we are almost there (except for SPARQL Update operations, which are a longer-term project).

Quickstart

If you want to skip the details and just get a running QLever instance to play around with. Follow the quickstart guide.

Alternatively to get started with a real (and really big) dataset we have prepared a Wikidata Quickstart Guide. This guide takes you through the entire process of loading the full Wikidata Knowledge Base into QLever, but don't worry it is pretty simple.

QLever's advanced features are described here.

Overview

The rest of this page is organized in the following sections. Taking you through the steps necessary to get a QLever instance up and runnining starting from a simple Turtle dump of a Knowledge Base.

Further documentation is available on the following topics

Building the QLever Docker Container

We recommend using QLever with docker. If you absolutely want to run QLever directly on your host see here.

The installation requires a 64-bit system, docker version 18.05 or newer and git.

git clone --recursive https://github.com/ad-freiburg/QLever.git qlever
cd qlever
docker build -t qlever .

This creates a docker image named "qlever" which contains everything needed to use QLever. If you want to be sure that everything is working as it should before proceeding, you can run the end-to-end tests

Creating an Index

Obtaining Data

First make sure that you have your input data ready and accessible on your machine. If you have no input data yet obtain it from one of our recommended sources or create your own knowledge base in standard NTriple or Turtle formats and (optionally) add a text corpus.

Note that QLever only accepts UTF-8 encoded input files. Then again you should be using UTF-8 anyway

Permissions

By default and when running docker without user namespaces, the container will use the user ID 1000 which on Linux is almost always the first real user. If the default user does not work add -u "$(id -u):$(id -g)" to docker run so that QLever executes as the current user.

When running docker with user namespaces you may need to make the index folder accessible to the user the QLever process is mapped to on the host (e.g. nobody, see /etc/subuid)

chmod -R o+rw ./index

Building the Index

Then proceed with creating an index.

Important: Ensure that you have enough disk space where your ./index folder resides or see below for using a separate path

To build a new index run a bash inside the QLever container as follows

docker run -it --rm \
           -v "<absolute_path_to_input>:/input" \
           -v "$(pwd)/index:/index" --entrypoint "bash" qlever

If you want to use a separate path you MUST change the "$(pwd)/index part in all docker … commands and replace it with the absolute path to your index.

From now on we are inside the container, make sure you follow all the coming instructions for creating an index and only then proceed to the next section.

If your input knowledge base is in the standard NTriple or Turtle format create the index with the following command

IndexBuilderMain -l -i /index/<prefix> -f /input/knowledge_base.ttl

Where <prefix> is the base name for all index files and -l externalizes long literals to disk. If you use index as the prefix you can later skip the -e INDEX_PREFIX=<prefix> flag.

To include a text collection, the wordsfile and docsfiles (see here for the required format) is provided with the -w and -d flags respectively.

Then the full command will look like this:

IndexBuilderMain -l -i /index/<prefix> -f /input/knowledge_base.ttl \
  -w /input/wordsfile.tsv -d /input/docsfile.tsv

You can also add a text index to an existing knowledge base index by adding the -A flag and ommitting the -f flag.

Running QLever

To run a QLever server container use the following command.

docker run -it -p 7001:7001 \
  -v "$(pwd)/index:/index" \
  -e INDEX_PREFIX=<prefix> \
  --name qlever \
  qlever

Where additional arguments can be added at the end of the command. If you want the container to run in the background and restart automatically replace -it with -d --restart=unless-stopped

Executing queries

The quickest way to run queries is to use the minimal web interface, available at the port specified above (7001 in the example). For a more advanced web interface you can use the QLever UI.

Queries can also be executed from the command line using curl

curl 'http://localhost:7001/?query=SELECT ?x WHERE {?x <rel> ?y}'

About

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 95.0%
  • Python 2.7%
  • CMake 0.7%
  • ANTLR 0.6%
  • JavaScript 0.3%
  • C 0.2%
  • Other 0.5%