Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite documentation #233

Merged
merged 41 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
7998505
Add streaming support for handling large queries
bigerl Jun 30, 2023
85a2cc3
Add Language Processor and SparqlJsonResultCountingParser
bigerl Jul 3, 2023
1e95fc8
Completed ResponseBodyProcessor and integrated it into SPARQLProtocol…
bigerl Jul 3, 2023
bb8275b
Worker integration and removal of a lot of code
bigerl Jul 10, 2023
84ed059
store results from ResponseBodyProcessor
nck-mlcnv Aug 30, 2023
1ead2f3
add getCurrentIndex method for QueryHandler
nck-mlcnv Sep 21, 2023
d2ce4d4
implement skip method for BigByteArrayInputStream
nck-mlcnv Oct 6, 2023
fe1663e
remove comments, unused code and disable some tests
nck-mlcnv Oct 6, 2023
55b1d28
add RequestFactoryTest
nck-mlcnv Oct 30, 2023
09abd78
remove unused imports and class
nck-mlcnv Oct 31, 2023
3462e53
Fix regional time issues in CSVStorageTest
nck-mlcnv Nov 3, 2023
cd3f97e
Add debug print
nck-mlcnv Nov 3, 2023
e47e359
Second attempt on fixing the datetime issue
nck-mlcnv Nov 3, 2023
e7998fc
Update example-suite.yml
nck-mlcnv Nov 7, 2023
3053eeb
Add new configuration.md draft
nck-mlcnv Nov 7, 2023
408517a
Further describe configuration
nck-mlcnv Feb 27, 2024
664c361
Add documentation for metrics
nck-mlcnv Mar 5, 2024
e57ccc1
Update documentation
nck-mlcnv Mar 12, 2024
741cf06
Add storage configuration description
nck-mlcnv Mar 12, 2024
0cb0441
Additional query configuration example
nck-mlcnv Mar 12, 2024
f1487b5
Update documentation
nck-mlcnv Mar 18, 2024
7cf78b7
Remove old file
nck-mlcnv Mar 19, 2024
1280b1a
Update README.md
nck-mlcnv Mar 19, 2024
cda8a7c
Remove unused files
nck-mlcnv Mar 19, 2024
f9ff2be
Update documentation
nck-mlcnv Mar 19, 2024
2ca16bd
Finalize language_processor.md
nck-mlcnv Mar 19, 2024
fd1ad16
Update documentation
nck-mlcnv Mar 19, 2024
0ad6a70
Fix rebase
nck-mlcnv Mar 20, 2024
d3078b2
Further changes to doc
nck-mlcnv Mar 20, 2024
df82cf2
Fix owl file
nck-mlcnv Mar 20, 2024
a291472
Apply suggestions from code review
nck-mlcnv Mar 27, 2024
e1c765e
Fix example-suite.yml
nck-mlcnv Mar 27, 2024
952804a
Update sectioning
nck-mlcnv Mar 27, 2024
7a3ccc7
Update sectioning and fix README.md
nck-mlcnv Mar 27, 2024
2ddda97
Fix iguana.owl
nck-mlcnv Mar 27, 2024
e9c2076
Update example-suite.yml
nck-mlcnv Mar 27, 2024
baad770
Update example-suite.yml
nck-mlcnv Mar 27, 2024
4f77f53
Update metrics.md
nck-mlcnv Mar 28, 2024
4da0019
Add a duration section in configuration overview.md
nck-mlcnv Apr 2, 2024
74d085f
Add example in documentation
nck-mlcnv May 8, 2024
efb1df5
Add symlink
nck-mlcnv May 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 37 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,82 +1,62 @@
# IGUANA

[![ci](https://github.com/dice-group/IGUANA/actions/workflows/ci.yml/badge.svg)](https://github.com/dice-group/IGUANA/actions/workflows/ci.yml)

<p align="center">
<img src="https://github.com/dice-group/IGUANA/raw/develop/images/IGUANA_logo.png" alt="IGUANA Logo" width="200">
</p>
Iguana is an integrated suite for benchmarking the read/write performance of HTTP endpoints and CLI Applications.

It provides an environment which ...

* is highly configurable
* provides a realistic scenario benchmark
* works on every dataset
* works on SPARQL HTTP endpoints
* works on HTTP Get & Post endpoints
* works on CLI applications
* and is easily extendable

For further information visit:
- [iguana-benchmark.eu](http://iguana-benchmark.eu)
- [Documentation](http://iguana-benchmark.eu/docs/3.3/)

### Available metrics
# IGUANA
Iguana is a benchmarking framework for testing the read performances of HTTP endpoints.
It is mostly designed for benchmarking triplestores by using the SPARQL protocol.
Iguana stresstests endpoints by simulating users which send a set of queries independently of each other.

Per run metrics:
* Query Mixes Per Hour (QMPH)
* Number of Queries Per Hour (NoQPH)
* Number of Queries (NoQ)
* Average Queries Per Second (AvgQPS)
* Penalized Average Queries Per Second (PAvgQPS)
Benchmarks are configured using a YAML-file, this allows them to be easily repeated and adjustable.
Results are stored in RDF-files and can also be exported as CSV-files.

Per query metrics:
* Queries Per Second (QPS)
* Penalized Queries Per Second (PQPS)
* Number of successful and failed queries
* result size
* queries per second
* sum of execution times
## Features
- Benchmarking of (SPARQL) HTTP endpoints
- Reusable configuration
- Calculation of various metrics for better comparisons
- Processing of HTTP responses (e.g., results counting)

## Setup Iguana
## Setup

### Prerequisites
You need to have `Java 17` or higher installed.
On Ubuntu it can be installed by executing the following command:

In order to run Iguana, you need to have `Java 17`, or greater, installed on your system.
```bash
sudo apt install openjdk-17-jre
```

### Download
Download the newest release of Iguana [here](https://github.com/dice-group/IGUANA/releases/latest), or run on a unix shell:

```sh
wget https://github.com/dice-group/IGUANA/releases/download/v4.0.0/iguana-4.0.0.zip
unzip iguana-4.0.0.zip
```
The latest release can be downloaded at https://github.com/dice-group/IGUANA/releases/latest.
The zip file contains three files:

The zip file contains the following files:

* `iguana-X.Y.Z.jar`
* `start-iguana.sh`
* `iguana-4.0.0.jar`
* `example-suite.yml`
* `start-iguana.sh`

### Create a Configuration

You can use the provided example configuration and modify it to your needs.
For further information please visit our [configuration](http://iguana-benchmark.eu/docs/3.2/usage/configuration/) and [Stresstest](http://iguana-benchmark.eu/docs/3.0/usage/stresstest/) wiki pages.

For a detailed, step-by-step instruction through a benchmarking example, please visit our [tutorial](http://iguana-benchmark.eu/docs/3.2/usage/tutorial/).

### Execute the Benchmark
### Configuration
The `example-suite.yml` file contains an extensive configuration for a benchmark suite.
It can be used as a starting point for your own benchmark suite.
For a detailed explanation of the configuration, see the [configuration](./configuration/overview.md) documentation.

Start Iguana with a benchmark suite (e.g. the example-suite.yml) either by using the start script:
## Usage
Start Iguana with a benchmark suite (e.g., the `example-suite.yml`) either by using the start script:

```sh
```bash
./start-iguana.sh example-suite.yml
```

or by directly executing the jar-file:

```sh
java -jar iguana-x-y-z.jar example-suite.yml
```bash
java -jar iguana-4.0.0.jar example-suite.yml
```

If you're using the script, you can use JVM arguments by setting the environment variable `IGUANA_JVM`.
For example, to let Iguana use 4GB of RAM you can set `IGUANA_JVM` as follows:

```bash
export IGUANA_JVM=-Xmx4g
```

# How to Cite
Expand Down
88 changes: 88 additions & 0 deletions docs_new/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
<p align="center">
<img src="https://github.com/dice-group/IGUANA/raw/develop/images/IGUANA_logo.png" alt="IGUANA Logo" width="200">
</p>

# IGUANA
Iguana is a benchmarking framework for testing the read performances of HTTP endpoints.
It is mostly designed for benchmarking triplestores by using the SPARQL protocol.
Iguana stresstests endpoints by simulating users which send a set of queries independently of each other.

Benchmarks are configured using a YAML-file, this allows them to be easily repeated and adjustable.
Results are stored in RDF-files and can also be exported as CSV-files.

## Features
- Benchmarking of (SPARQL) HTTP endpoints
- Reusable configuration
- Calculation of various metrics for better comparisons
- Processing of HTTP responses (e.g., results counting)

## Setup

### Prerequisites
You need to have `Java 17` or higher installed.
On Ubuntu it can be installed by executing the following command:

```bash
sudo apt install openjdk-17-jre
```

### Download
The latest release can be downloaded at https://github.com/dice-group/IGUANA/releases/latest.
The zip file contains three files:

* `iguana-4.0.0.jar`
* `example-suite.yml`
* `start-iguana.sh`

### Configuration
The `example-suite.yml` file contains an extensive configuration for a benchmark suite.
It can be used as a starting point for your own benchmark suite.
For a detailed explanation of the configuration, see the [configuration](./configuration/overview.md) documentation.

## Usage
Start Iguana with a benchmark suite (e.g., the `example-suite.yml`) either by using the start script:

```bash
./start-iguana.sh example-suite.yml
```

or by directly executing the jar-file:

```bash
java -jar iguana-4.0.0.jar example-suite.yml
```

If you're using the script, you can use JVM arguments by setting the environment variable `IGUANA_JVM`.
For example, to let Iguana use 4GB of RAM you can set `IGUANA_JVM` as follows:

```bash
export IGUANA_JVM=-Xmx4g
```

# How to Cite

```bibtex
@InProceedings{10.1007/978-3-319-68204-4_5,
author="Conrads, Lixi
and Lehmann, Jens
and Saleem, Muhammad
and Morsey, Mohamed
and Ngonga Ngomo, Axel-Cyrille",
editor="d'Amato, Claudia
and Fernandez, Miriam
and Tamma, Valentina
and Lecue, Freddy
and Cudr{\'e}-Mauroux, Philippe
and Sequeda, Juan
and Lange, Christoph
and Heflin, Jeff",
title="Iguana: A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores",
booktitle="The Semantic Web -- ISWC 2017",
year="2017",
publisher="Springer International Publishing",
address="Cham",
pages="48--65",
abstract="The performance of triples stores is crucial for applications driven by RDF. Several benchmarks have been proposed that assess the performance of triple stores. However, no integrated benchmark-independent execution framework for these benchmarks has yet been provided. We propose a novel SPARQL benchmark execution framework called Iguana. Our framework complements benchmarks by providing an execution environment which can measure the performance of triple stores during data loading, data updates as well as under different loads and parallel requests. Moreover, it allows a uniform comparison of results on different benchmarks. We execute the FEASIBLE and DBPSB benchmarks using the Iguana framework and measure the performance of popular triple stores under updates and parallel user requests. We compare our results (See https://doi.org/10.6084/m9.figshare.c.3767501.v1) with state-of-the-art benchmarking results and show that our benchmark execution framework can unveil new insights pertaining to the performance of triple stores.",
isbn="978-3-319-68204-4"
}
```
15 changes: 15 additions & 0 deletions docs_new/configuration/language_processor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Language Processor

Language processors are used to process the response bodies of the HTTP requests that are executed by the workers.
The processing is done to extract relevant information from the responses and store them in the results.

Language processors are defined by the content type of the response body they process.
They cannot be configured directly in the configuration file, but are used by the response body processors.

Currently only the `SaxSparqlJsonResultCountingParser` language processor is supported for the `application/sparql-results+json` content type.

## SaxSparqlJsonResultCountingParser

The `SaxSparqlJsonResultCountingParser` is a language processor used to extract simple information from the responses of SPARQL endpoints that are in the `application/sparql-results+json` format.
It counts the number of results, the number of variables,
and the number of bindings from the response of a `SELECT` or `ASK` query.
84 changes: 84 additions & 0 deletions docs_new/configuration/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Metrics

Metrics are used to measure and compare the performance of the system during the stresstest.
They are divided into task metrics, worker metrics, and query metrics.

Task metrics are calculated for every query execution across the whole task.
Worker metrics are calculated for every query execution of one worker.
Query metrics are calculated for every execution of one query across one worker and across every worker.

For a detailed description of how results for tasks, workers and queries are reported in the RDF result file, please refer to the section [RDF results](rdf_results.md).

## Configuration

The metrics are configured in the `metrics` section of the configuration file.
To enable a metric, add an entry to the `metrics` list with the `type` of the metric.
Some metrics (`PQPS`, `PAvgQPS`) require the configuration of a `penalty` value,
which is the time in milliseconds that a failed query will be penalized with.

```yaml
metrics:
- type: "QPS"
- type: "AvgQPS"
- type: "PQPS"
penalty: 180000 # in milliseconds
```

If the `metrics` section is not present in the configuration file, the following **default** configuration is used:
```yaml
metrics:
- type: "AES"
- type: "EachQuery"
- type: "QPS"
- type: "AvgQPS"
- type: "NoQ"
- type: "NoQPH"
- type: "QMPH"
```

## Available metrics

| Name | Configuration type | Additional parameters | Scope | Description |
|--------------------------------------|--------------------|-----------------------------|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Queries per second | `QPS` | | query | The number of successfully executed queries per second. It is calculated by dividing the number of successfully executed queries |
| Average queries per second | `AvgQPS` | | task, worker | The average number of queries successfully executed per second. It is calculated by dividing the sum of the QPS values of every query the task or worker has by the number of queries. |
| Number of queries | `NoQ` | | task, worker | The number of successfully executed queries. This metric is calculated for each worker and for the whole task. |
| Number of queries per hour | `NoQPH` | | task, worker | The number of successfully executed queries per hour. It is calculated by dividing the number of successfully executed queries by their sum of time (in hours) it took to execute them. The metric value for the task is the sum of the metric for each worker. |
| Query mixes per hour | `QMPH` | | task, worker | The number of query mixes executed per hour. A query mix is the set of queries executed by a worker, or the whole task. This metric is calculated for each worker and for the whole task. It is calculated by dividing the number of successfully executed queries by the number of queries inside the query mix and by their sum of time (in hours) it took to execute them. |
| Penalized queries per second | `PQPS` | `penalty` (in milliseconds) | query | The number of queries executed per second, penalized by the number of failed queries. It is calculated by dividing the number of successful and failed query executions by their sum of time (in seconds) it took to execute them. If a query fails, the time it took to execute it is set to the given `penalty` value. |
| Penalized average queries per second | `PAvgQPS` | `penalty` (in milliseconds) | task, worker | The average number of queries executed per second, penalized by the number of failed queries. It is calculated by dividing the sum of the PQPS of each query the task or worker has executed by the number of queries. |
| Aggregated execution statistics | `AES` | | task, worker | _see below_ |
| Each execution statistic | `EachQuery` | | query | _see below_ |

## Other metrics

### Aggregated Execution Statistics (AES)
This metric collects for each query that belongs to a worker or a task a number of statistics
that are aggregated for each execution.

| Name | Description |
|---------------------|--------------------------------------------------------------|
| `succeeded` | The number of successful executions. |
| `failed` | The number of failed executions. |
| `resultSize` | The size of the HTTP response. (only stores the last result) |
| `timeOuts` | The number of executions that resulted with a timeout. |
| `wrongCodes` | The number of HTTP status codes received that were not 200. |
| `unknownExceptions` | The number of unknown exceptions during execution. |
| `totalTime` | The total time it took to execute the queries. |

The `resultSize` is the size of the HTTP response in bytes and is an exception to the aggregation.

### Each Execution Statistic (EachQuery)
This metric collects statistics for each execution of a query.

| Name | Description |
|----------------|-----------------------------------------------------------------------------------------------------------|
| `run` | The number of the execution. |
| `startTime` | The time stamp where the execution started. |
| `time` | The time it took to execute the query. |
| `success` | If the execution was successful. |
| `code` | Numerical value of the end state of the execution. (success=0, timeout=110, http_error=111, exception=1) |
| `resultSize` | The size of the HTTP response. |
| `exception` | The exception that occurred during execution. (if any occurred) |
| `httpCode` | The HTTP status code received. (if any was received) |
| `responseBody` | The hash of the HTTP response body. (only if `parseResults` inside the stresstest has been set to `true`) |
Loading
Loading