Better documentation

netwerk-digitaal-erfgoed · Nov 30, 2023 · f77b277 · f77b277
1 parent b24d408
commit f77b277
Showing 1 changed file with 26 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -4,8 +4,20 @@ LDWorkbench is a Linked Data Transformation tool designed to use only SPARQL as
 
 This project is currently in a Proof of Concept phase, feel free to watch our progress, but please do not use this project in a production setup.
 
+## How an LD Workbench pipelines works
+
+A *pipeline* is the set of instructions that are run to transform Linked Data. It consists of *stages* with *iterators* and *generators*.
+
+The idea of this project is to use SPARQL `select` to create an iterator of iri's (defined by *binding* `$this`) from an endpoint or local RDF file. This makes it possible to go over huge datasets by paginating results using SPARQL `offset` and `limit` parameters. Each yield of `$this` is then used as input for a SPARQL `construct` query that will be [pre-binded](https://www.w3.org/TR/shacl/#pre-binding) with `$this`. The generator creates RDF statements that will be part of the endresult of the workbench pipeline.
+
+Each pipeline consists of 1 or more *stages*, where a *stage* is the combination of 1 iterator and 1 generator (more that 1 generator will be implemented later).
+
+A workbench pipeline is defined by a configuration file, stored in [YAML](https://yaml.org). The configuration is validated using a [JSON Schema](https://json-schema.org). The schema [is part of this repository](https://github.com/netwerk-digitaal-erfgoed/ld-workbench/blob/main/static/ld-workbench.schema.json). The easiest way to work with YAML files and JSON Schemas is to use Microsoft's [Visual Studio Code](https://code.visualstudio.com). If you follow the installation instructions and use the `--init` script, your workbench project will contain the correct settings to work with YAML files and JSON Schemas without any extra settings.
+
+A pipeline must have a `name`, 1 or more `stages` and optionaly a `description`. If you have multiple pipelines, each pipepline must have a unique name.  See the [example configuration file](https://github.com/netwerk-digitaal-erfgoed/ld-workbench/blob/main/static/example/config.yml) for a boilerplate configuration file. A visualisation of the schema giving more insights on required and optional properties can be [found here](https://json-schema.app/view/%23?url=https%3A%2F%2Fraw.githubusercontent.com%2Fnetwerk-digitaal-erfgoed%2Fld-workbench%2Fmain%2Fstatic%2Fld-workbench.schema.json).
+
 ## Install & Usage
-The quickest way to get started with LDWorkbench is follow these instruction:
+The quickest way to get started with LDWorkbench is to follow these instruction:
 
 ```bash
 mkdir ldworkbench
@@ -20,8 +32,20 @@ Your workbench is now ready for use. An example workbench is provided, run it wi
 npx ldworkbench
 ```
 
-### Configuring a workbench project
+### Configuring a workbench pipeline
+To keep your workbench workspace clean, we recommend to create a folder for each pipeline that contains the configuration and the SPARQL select and construct queries. The application uses the folder `pipelines/configurations` by default to look for YAML configurations of pipelines, so it is best to save your configuratiosn there.
+
+An example pipeline folders and files structure might look like this:
 
+```
+your-working-dir
+|-- pipelines
+|   |-- configurations
+|   |   |-- my-pipeline
+|   |   |   |-- configuration.yaml
+|   |   |   |-- select.rq
+|   |   |   |-- construct.rq
+```
 
 ## Development
 For local development, these script should get you going: