Skip to content

Commit

Permalink
Add Path Search Feature to Qlever (#1335)
Browse files Browse the repository at this point in the history
This commit adds a feature that computes all paths between pairs of `(source, target)` on a given graph. The source, target, and graph can be configured. The syntax to trigger this search is using a SERVICE request with a special IRI. The details of this API are described in the file `docs/path_search.md`.
The current implementation only supports the computation of all paths, which can be exponentially larger than the number of edges in the graph. The implemented infrastructure makes it simpler to implement additional features for this service in the future, such as:
* Only return an arbitrary single path between given start and end nodes.
* Only return the shortest path
* Only return the longest path (same complexity as "all paths" , but less memory requirements).
* Return all edges that lie on an path.

Additionally it can be extended by efficiently supporting the LIMIT/OFFSET clauses and lazy evaluation to efficiently support dealing with very large results in the future.
  • Loading branch information
JoBuRo authored Oct 21, 2024
1 parent bf36257 commit 2ebca4d
Show file tree
Hide file tree
Showing 17 changed files with 2,925 additions and 8 deletions.
290 changes: 290 additions & 0 deletions docs/path_search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
# Path Search Feature Documentation for SPARQL Engine

## Overview

The Path Search feature in this SPARQL engine allows users to perform advanced queries
to find paths between sources and targets in a graph. It supports a variety of configurations,
including single or multiple source and target nodes, optional edge properties, and
custom algorithms for path discovery. This feature is accessed using the `SERVICE` keyword
and the service IRI `<https://qlever.cs.uni-freiburg.de/pathSearch/>`.

## Basic Syntax

The general structure of a Path Search query is as follows:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ; # Specify the algorithm
pathSearch:source <sourceNode> ; # Specify the source node(s)
pathSearch:target <targetNode> ; # Specify the target node(s)
pathSearch:pathColumn ?path ; # Bind the path variable
pathSearch:edgeColumn ?edge ; # Bind the edge variable
pathSearch:start ?start ; # Bind the edge start variable
pathSearch:end ?end ; # Bind the edge end variable
{SELECT * WHERE {
?start <predicate> ?end. # Define the edge pattern
}}
}
}
```

### Parameters

- **pathSearch:algorithm**: Defines the algorithm used to search paths. Currently, only `pathSearch:allPaths` is supported.
- **pathSearch:source**: Defines the source node(s) of the search.
- **pathSearch:target** (optional): Defines the target node(s) of the search.
- **pathSearch:pathColumn**: Defines the variable for the path.
- **pathSearch:edgeColumn**: Defines the variable for the edge.
- **pathSearch:start**: Defines the variable for the start of the edges.
- **pathSearch:end**: Defines the variable for the end of the edges.
- **pathSearch:edgeProperty** (optional): Specifies properties for the edges in the path.
- **pathSearch:cartesian** (optional): Controls the behaviour of path searches between
source and target nodes. Expects a boolean. The default is `true`.
- If set to `true`, the search will compute the paths from each source to **all targets**
- If set to `false`, the search will compute the paths from each source to exactly
**one target**. Sources and targets are paired based on their index (i.e. the paths
from the first source to the first target are searched, then the second source and
target, and so on).


### Example 1: Single Source and Target

The simplest case is searching for paths between a single source and a single target:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

### Example 2: Multiple Sources or Targets

It is possible to specify a set of sources or targets for the path search.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

This query will search forall between all sources and all targets, i.e.
- (`<source1>`, `<target1>`)
- (`<source1>`, `<target2>`)
- (`<source2>`, `<target1>`)
- (`<source2>`, `<target2>`)

It is possible to specify, whether the sources and targets should be combined according
to the cartesian product (as seen above) or if they should be matched up pairwise, i.e.
- (`<source1>`, `<target1>`)
- (`<source2>`, `<target2>`)

This can be done with the parameter `pathSearch:cartesian`. This parameter expects a
boolean. If set to `true`, then the cartesian product is used to match the sources with
the targets.
If set to `false`, then the sources and targets are matched pairwise. If left
unspecified, then the default `true` is used.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
pathSearch:cartesian false;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

### Example 3: Edge Properties

You can also include edge properties in the path search to further refine the results:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:edgeProperty ?middle ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate1> ?middle.
?middle <predicate2> ?end.
}
}
}
}
```

This is esecially useful for [N-ary relations](https://www.w3.org/TR/swbp-n-aryRelations/).
Considering the example above, it is possible to query additional relations of `?middle`:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:edgeProperty ?middle ;
pathSearch:edgeProperty ?edgeInfo ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate1> ?middle.
?middle <predicate2> ?end.
?middle <predicate3> ?edgeInfo.
}
}
}
}
```

This makes it possible to query additional properties of the edge between `?start` and `?end` (such as `?edgeInfo` in the example above).


### Example 4: Source or Target as Variables

You can also bind the source and/or target dynamically using variables. The examples
below use `VALUES` clauses, which can be convenient to specify sources and targets.
However, the source/target variables can also be bound using any regular SPARQL construct.

#### Source Variable

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
VALUES ?source {<source>}
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source ?source ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

#### Target Variable

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
VALUES ?target {<target>}
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target ?target ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

## Error Handling

The Path Search feature will throw errors in the following scenarios:

- **Missing Start Parameter**: If the `start` parameter is not specified, an error will be raised.
- **Multiple Start or End Variables**: If multiple `start` or `end` variables are defined, an error is raised.
- **Invalid Non-Variable Start/End**: If the `start` or `end` parameter is not bound to a variable, the query will fail.
- **Unsupported Argument**: Arguments other than those listed (like custom user arguments) will cause an error.
- **Non-IRI Predicate**: Predicates must be IRIs. If not, an error will occur.

### Example: Missing Start Parameter

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <x> ;
pathSearch:target <z> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:end ?end ; # Missing start
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

This query would fail with a "Missing parameter 'start'" error.

28 changes: 28 additions & 0 deletions e2e/scientists_queries.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1017,6 +1017,34 @@ queries:
- contains_row: ["<Character_Occupation>"]
- contains_row: ["1.87"]

- query: path_search_all_paths
type: no-text
sparql: |
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT * WHERE {
SERVICE pathSearch: {
pathSearch: pathSearch:algorithm pathSearch:allPaths;
pathSearch:source <Mary_Ann_Leeper>;
pathSearch:target <Literature_Subject>;
pathSearch:pathColumn ?path;
pathSearch:edgeColumn ?edge;
pathSearch:start ?start;
pathSearch:end ?end;
{SELECT * WHERE {
?start <is-a> ?end
}}
}
}
checks:
- num_rows: 17
- num_cols: 4
- selected: ["?path", "?edge", "?start", "?end"]
- contains_row: ["0", "0", "<Mary_Ann_Leeper>", "<Radiobiochemist>"]
- contains_row: ["0", "1", "<Radiobiochemist>", "<Chemist>"]
- contains_row: ["0", "2", "<Chemist>", "<Literature_Subject>"]
- contains_row: ["4", "0", "<Mary_Ann_Leeper>", "<Biochemist>"]
- contains_row: ["4", "1", "<Biochemist>", "<Literature_Subject>"]


- query : property_path_inverse
type: no-text
Expand Down
2 changes: 1 addition & 1 deletion src/engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ add_library(engine
VariableToColumnMap.cpp ExportQueryExecutionTrees.cpp
CartesianProductJoin.cpp TextIndexScanForWord.cpp TextIndexScanForEntity.cpp
TextLimit.cpp LazyGroupBy.cpp GroupByHashMapOptimization.cpp SpatialJoin.cpp
CountConnectedSubgraphs.cpp SpatialJoinAlgorithms.cpp)
CountConnectedSubgraphs.cpp SpatialJoinAlgorithms.cpp PathSearch.cpp)
qlever_target_link_libraries(engine util index parser sparqlExpressions http SortPerformanceEstimator Boost::iostreams s2)
3 changes: 2 additions & 1 deletion src/engine/CheckUsePatternTrick.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ bool isVariableContainedInGraphPatternOperation(
} else if constexpr (std::is_same_v<T, p::Service>) {
return ad_utility::contains(arg.visibleVariables_, variable);
} else {
static_assert(std::is_same_v<T, p::TransPath>);
static_assert(std::is_same_v<T, p::TransPath> ||
std::is_same_v<T, p::PathQuery>);
// The `TransPath` is set up later in the query planning, when this
// function should not be called anymore.
AD_FAIL();
Expand Down
Loading

0 comments on commit 2ebca4d

Please sign in to comment.