Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Path Search Feature to Qlever #1335

Merged
merged 103 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from 102 commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
2c360bf
Added PathSearch class
JoBuRo Apr 18, 2024
b620e53
Added test class for PathSearch
JoBuRo Apr 19, 2024
c8f4c28
Added new sources to CMakeLists
JoBuRo Apr 19, 2024
23d5eb5
Added boilerplate code for override
JoBuRo Apr 19, 2024
f59ad6b
First draft of path search
JoBuRo Apr 23, 2024
fcff8ba
Implemented Path Search using boost
JoBuRo Apr 26, 2024
92826a9
Simplified visitor, added cycle test
JoBuRo Apr 26, 2024
982708e
Added test, fixed cycles
JoBuRo Apr 28, 2024
3c9345e
Added pathfinding for multiple targets
JoBuRo Apr 28, 2024
5def5eb
Added edge properties
JoBuRo Apr 29, 2024
fce274e
Added shortest path search
JoBuRo Apr 30, 2024
9d10dec
Merge branch 'master' into path-search
JoBuRo Apr 30, 2024
e371542
Fixed setTextLimit error after merge
JoBuRo Apr 30, 2024
178fd14
Added PathSearch parsing
JoBuRo Jun 19, 2024
f228eb6
Moved visitors to new file
JoBuRo Jun 19, 2024
3966a69
Fixed a bug where the wrong sub columns were read
JoBuRo Jun 23, 2024
1aa8350
Fixed QueryPlanner PathSearch tests
JoBuRo Jun 23, 2024
536e5fe
Added documentation to PathSearch and visitors
JoBuRo Jun 23, 2024
02380c3
Merge branch 'master' into path-search
JoBuRo Jun 24, 2024
2451fd7
Rename ResultTable to Result in PathSearch
JoBuRo Jun 24, 2024
d700df2
Format fix
JoBuRo Jun 24, 2024
19027e2
Fix the format fix
JoBuRo Jun 24, 2024
ec7bfd0
Added PathSearch e2e tests
JoBuRo Jun 24, 2024
47daa2b
Reworked AllPathsVisitor
JoBuRo Jun 25, 2024
765f1aa
Format fix
JoBuRo Jun 25, 2024
da1eb3a
Sonar Fixes
JoBuRo Jun 26, 2024
0027d7b
Added multisource to PathSearch
JoBuRo Jun 30, 2024
fbb61a1
format fix
JoBuRo Jun 30, 2024
17a8b39
Added mutlisource multitarget tests
JoBuRo Jul 1, 2024
b2611c9
Added createJoinWithPathSearch
JoBuRo Jul 2, 2024
08da81b
Added tests, finished binding logic
JoBuRo Jul 3, 2024
adf85bc
Added runtime info
JoBuRo Jul 3, 2024
48dbab1
Added cancellation checks
JoBuRo Jul 3, 2024
d5b513b
Fixed CacheKey
JoBuRo Jul 4, 2024
49dfaa3
Removed unneeded members
JoBuRo Jul 4, 2024
f6357a7
Format fix
JoBuRo Jul 4, 2024
abdc36a
Simplified handleSearchSide
JoBuRo Jul 5, 2024
1c892a2
New all paths implementation
JoBuRo Jul 8, 2024
ae39abc
Format fix
JoBuRo Jul 8, 2024
411ba0a
Extracted visitPathQuery method
JoBuRo Jul 8, 2024
ed03651
Moved PathSearchConfig creation to PathQuery method
JoBuRo Jul 8, 2024
818e41b
Sonar Fixes
JoBuRo Jul 8, 2024
acf09c8
Sonar fixes
JoBuRo Jul 9, 2024
385f67a
Added PathSearchException
JoBuRo Jul 9, 2024
946bda3
Improved error handling and path query parsing
JoBuRo Jul 9, 2024
aec3e34
Added docstring for PathQuery
JoBuRo Jul 9, 2024
de33fdd
Fixed typo
JoBuRo Jul 9, 2024
e9def11
Added tests for path search exceptions
JoBuRo Jul 9, 2024
6ce1494
Merge branch 'master' into path-search
JoBuRo Jul 9, 2024
eea3625
Improved setVariable lambda in PathQuery
JoBuRo Jul 10, 2024
ae175ac
Removed shortestPaths and boost BGL
JoBuRo Jul 10, 2024
13494b8
Simplified Edge
JoBuRo Jul 10, 2024
1c209fe
Refactored DFS
JoBuRo Jul 11, 2024
f952ed4
Switched path search implementation
JoBuRo Aug 13, 2024
f80694b
Revert "Switched path search implementation"
JoBuRo Sep 7, 2024
c3fd4cd
Fixed iterative PathSearch
JoBuRo Sep 7, 2024
1827153
Added namespace pathSearch
JoBuRo Sep 7, 2024
a6a8957
Use already started timer
JoBuRo Sep 7, 2024
71d891d
Added bound sides to children
JoBuRo Sep 18, 2024
0af23f0
Refactored search side handling
JoBuRo Sep 20, 2024
1b3efa6
fixed create join with two columns at once
JoBuRo Sep 20, 2024
3433bd7
Added option to do non-cartesian path search
JoBuRo Sep 20, 2024
3f5ab2f
Removed unused e2e test
JoBuRo Sep 20, 2024
48d7d2c
Adjusted exception string when parsing
JoBuRo Sep 20, 2024
27d8257
Renamed PathQuery::fromBasicPattern to addBasicPattern
JoBuRo Sep 20, 2024
0bd7622
Made edgeproperty cols possibly undefined
JoBuRo Sep 20, 2024
495bfd8
Added pathSearch identifier to cache key
JoBuRo Sep 20, 2024
7a082b7
Simplified PathSearchConfig test matcher
JoBuRo Sep 20, 2024
54b2be2
Use string_views for parameters
JoBuRo Sep 20, 2024
ee811e4
implment addGraph
JoBuRo Sep 20, 2024
6520019
Improved documentation
JoBuRo Sep 25, 2024
0a833dc
Fixed join on edge property
JoBuRo Sep 25, 2024
f3aa985
Added row check to path search e2e tests
JoBuRo Sep 25, 2024
0f767ba
Merge branch 'master' into path-search
JoBuRo Sep 25, 2024
e34f0ae
spell fix
JoBuRo Sep 25, 2024
cf2eaae
Format fix
JoBuRo Sep 25, 2024
2fcacd5
Added PathSearch tests
JoBuRo Sep 25, 2024
24975e8
Sonar fixes
JoBuRo Sep 27, 2024
7f1823e
Merge branch 'master' into path-search
JoBuRo Oct 1, 2024
949455b
Fixed merge error
JoBuRo Oct 1, 2024
b539bcf
Remove unused functions
JoBuRo Oct 1, 2024
cad6f51
Fix paths to result table
JoBuRo Oct 1, 2024
ede7fd3
Fix source as variable case
JoBuRo Oct 1, 2024
e91adf2
Fixed lifetime issue with empty sources
JoBuRo Oct 2, 2024
dcc5925
Improved error message
JoBuRo Oct 2, 2024
54d40a9
Added allocator at important points
JoBuRo Oct 2, 2024
d26da4d
Format fix
JoBuRo Oct 2, 2024
5c9c8c0
Fix lifetime issue for certain platforms
JoBuRo Oct 2, 2024
73f4ead
Added tests for multi source query planning
JoBuRo Oct 4, 2024
57ffd2a
format fix
JoBuRo Oct 4, 2024
7aaee34
Sonar fixes
JoBuRo Oct 4, 2024
8cb4c04
Added PathSearch documentation
JoBuRo Oct 4, 2024
bf2f3a4
Added PathTests for bound path search
JoBuRo Oct 4, 2024
290a89d
Merge branch 'master' into path-search
JoBuRo Oct 15, 2024
bc4732f
Fixed merge and format
JoBuRo Oct 15, 2024
dd76bdb
Added allocators to visited and path cache
JoBuRo Oct 15, 2024
42be542
Added cancellation checks
JoBuRo Oct 15, 2024
dcdeb03
Fixed test
JoBuRo Oct 15, 2024
79a8bcd
format fix
JoBuRo Oct 15, 2024
c710553
Replaced std::vector<Id> in Edge with row index
JoBuRo Oct 15, 2024
08ef874
Added tests to improve coverage
JoBuRo Oct 16, 2024
aa632a1
Added source/target subtrees to cachekey
JoBuRo Oct 16, 2024
597a94b
Merge branch 'master' into path-search
joka921 Oct 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
290 changes: 290 additions & 0 deletions docs/path_search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
# Path Search Feature Documentation for SPARQL Engine

## Overview

The Path Search feature in this SPARQL engine allows users to perform advanced queries
to find paths between sources and targets in a graph. It supports a variety of configurations,
including single or multiple source and target nodes, optional edge properties, and
custom algorithms for path discovery. This feature is accessed using the `SERVICE` keyword
and the service IRI `<https://qlever.cs.uni-freiburg.de/pathSearch/>`.

## Basic Syntax

The general structure of a Path Search query is as follows:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ; # Specify the algorithm
pathSearch:source <sourceNode> ; # Specify the source node(s)
pathSearch:target <targetNode> ; # Specify the target node(s)
pathSearch:pathColumn ?path ; # Bind the path variable
pathSearch:edgeColumn ?edge ; # Bind the edge variable
pathSearch:start ?start ; # Bind the edge start variable
pathSearch:end ?end ; # Bind the edge end variable
{SELECT * WHERE {
?start <predicate> ?end. # Define the edge pattern
}}
}
}
```

### Parameters

- **pathSearch:algorithm**: Defines the algorithm used to search paths. Currently, only `pathSearch:allPaths` is supported.
- **pathSearch:source**: Defines the source node(s) of the search.
- **pathSearch:target** (optional): Defines the target node(s) of the search.
- **pathSearch:pathColumn**: Defines the variable for the path.
- **pathSearch:edgeColumn**: Defines the variable for the edge.
- **pathSearch:start**: Defines the variable for the start of the edges.
- **pathSearch:end**: Defines the variable for the end of the edges.
- **pathSearch:edgeProperty** (optional): Specifies properties for the edges in the path.
- **pathSearch:cartesian** (optional): Controls the behaviour of path searches between
source and target nodes. Expects a boolean. The default is `true`.
- If set to `true`, the search will compute the paths from each source to **all targets**
- If set to `false`, the search will compute the paths from each source to exactly
**one target**. Sources and targets are paired based on their index (i.e. the paths
from the first source to the first target are searched, then the second source and
target, and so on).


### Example 1: Single Source and Target

The simplest case is searching for paths between a single source and a single target:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

### Example 2: Multiple Sources or Targets

It is possible to specify a set of sources or targets for the path search.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

This query will search forall between all sources and all targets, i.e.
- (`<source1>`, `<target1>`)
- (`<source1>`, `<target2>`)
- (`<source2>`, `<target1>`)
- (`<source2>`, `<target2>`)

It is possible to specify, whether the sources and targets should be combined according
to the cartesian product (as seen above) or if they should be matched up pairwise, i.e.
- (`<source1>`, `<target1>`)
- (`<source2>`, `<target2>`)

This can be done with the parameter `pathSearch:cartesian`. This parameter expects a
boolean. If set to `true`, then the cartesian product is used to match the sources with
the targets.
If set to `false`, then the sources and targets are matched pairwise. If left
unspecified, then the default `true` is used.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
pathSearch:cartesian false;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

### Example 3: Edge Properties

You can also include edge properties in the path search to further refine the results:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:edgeProperty ?middle ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate1> ?middle.
?middle <predicate2> ?end.
}
}
}
}
```

This is esecially useful for [N-ary relations](https://www.w3.org/TR/swbp-n-aryRelations/).
Considering the example above, it is possible to query additional relations of `?middle`:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:edgeProperty ?middle ;
pathSearch:edgeProperty ?edgeInfo ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate1> ?middle.
?middle <predicate2> ?end.
?middle <predicate3> ?edgeInfo.
}
}
}
}
```

This makes it possible to query additional properties of the edge between `?start` and `?end` (such as `?edgeInfo` in the example above).


### Example 4: Source or Target as Variables

You can also bind the source and/or target dynamically using variables. The examples
below use `VALUES` clauses, which can be convenient to specify sources and targets.
However, the source/target variables can also be bound using any regular SPARQL construct.

#### Source Variable

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
VALUES ?source {<source>}
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source ?source ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

#### Target Variable

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
VALUES ?target {<target>}
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target ?target ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

## Error Handling

The Path Search feature will throw errors in the following scenarios:

- **Missing Start Parameter**: If the `start` parameter is not specified, an error will be raised.
- **Multiple Start or End Variables**: If multiple `start` or `end` variables are defined, an error is raised.
- **Invalid Non-Variable Start/End**: If the `start` or `end` parameter is not bound to a variable, the query will fail.
- **Unsupported Argument**: Arguments other than those listed (like custom user arguments) will cause an error.
- **Non-IRI Predicate**: Predicates must be IRIs. If not, an error will occur.

### Example: Missing Start Parameter

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <x> ;
pathSearch:target <z> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:end ?end ; # Missing start
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

This query would fail with a "Missing parameter 'start'" error.

28 changes: 28 additions & 0 deletions e2e/scientists_queries.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1017,6 +1017,34 @@ queries:
- contains_row: ["<Character_Occupation>"]
- contains_row: ["1.87"]

- query: path_search_all_paths
type: no-text
sparql: |
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
SELECT * WHERE {
SERVICE pathSearch: {
pathSearch: pathSearch:algorithm pathSearch:allPaths;
pathSearch:source <Mary_Ann_Leeper>;
pathSearch:target <Literature_Subject>;
pathSearch:pathColumn ?path;
pathSearch:edgeColumn ?edge;
pathSearch:start ?start;
pathSearch:end ?end;
{SELECT * WHERE {
?start <is-a> ?end
}}
}
}
checks:
- num_rows: 17
- num_cols: 4
- selected: ["?path", "?edge", "?start", "?end"]
JoBuRo marked this conversation as resolved.
Show resolved Hide resolved
- contains_row: ["0", "0", "<Mary_Ann_Leeper>", "<Radiobiochemist>"]
- contains_row: ["0", "1", "<Radiobiochemist>", "<Chemist>"]
- contains_row: ["0", "2", "<Chemist>", "<Literature_Subject>"]
- contains_row: ["4", "0", "<Mary_Ann_Leeper>", "<Biochemist>"]
- contains_row: ["4", "1", "<Biochemist>", "<Literature_Subject>"]


- query : property_path_inverse
type: no-text
Expand Down
3 changes: 2 additions & 1 deletion src/engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,6 @@ add_library(engine
VariableToColumnMap.cpp ExportQueryExecutionTrees.cpp
CartesianProductJoin.cpp TextIndexScanForWord.cpp TextIndexScanForEntity.cpp
TextLimit.cpp LazyGroupBy.cpp GroupByHashMapOptimization.cpp SpatialJoin.cpp
CountConnectedSubgraphs.cpp)
CountConnectedSubgraphs.cpp PathSearch.cpp)

qlever_target_link_libraries(engine util index parser sparqlExpressions http SortPerformanceEstimator Boost::iostreams s2)
3 changes: 2 additions & 1 deletion src/engine/CheckUsePatternTrick.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@
} else if constexpr (std::is_same_v<T, p::Service>) {
return ad_utility::contains(arg.visibleVariables_, variable);
} else {
static_assert(std::is_same_v<T, p::TransPath>);
static_assert(std::is_same_v<T, p::TransPath> ||
std::is_same_v<T, p::PathQuery>);

Check warning on line 73 in src/engine/CheckUsePatternTrick.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/CheckUsePatternTrick.cpp#L72-L73

Added lines #L72 - L73 were not covered by tests
// The `TransPath` is set up later in the query planning, when this
// function should not be called anymore.
AD_FAIL();
Expand Down
Loading
Loading