diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index 1bae6359..f07556b3 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -42,22 +42,13 @@ cache: script: - 'mvn $MAVEN_CLI_OPTS test' -# Build merge requests using JDK8 -build:jdk8: - <<: *build - image: maven:3-jdk-8 - -# Test merge requests using JDK8 -test:jdk8: - <<: *test - image: maven:3-jdk-8 # Build merge requests using JDK12 -build:jdk12: +build:jdk11: <<: *build - image: maven:3-jdk-12 + image: maven:3-jdk-11 # Test merge requests using JDK12 -test:jdk12: +test:jdk11: <<: *test - image: maven:3-jdk-12 + image: maven:3-jdk-11 diff --git a/CHANGELOG.md b/CHANGELOG.md index c968b923..09c8a16d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,10 +6,30 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). ## Unreleased - -* Function mapping. + +* Support for stream partitioning in windows * Joins of data streams +## [2.1.0] - 2020-03-18 + +### Added +* Support for functions on a per-record basis using the [Function Ontology](https://fno.io/). + +### Changed +* Updated Flink from version 1.10.0 to 1.11.3 +* Updated Kafka from version 2.2.2 to 2.4.1 (more versions supported using the universal connector) +* Updated VTD-XML from version 2.11 to 2.13.4 + +### Removed +* Drop support for Java 8, only Java 11 supported. +* TCP PUSH support disabled: this code relies on development version of Apache Bahir. + +### Fixed +* Cyclic reference of parent triples maps leads to a stack overflow error (GitHub [issue #19](https://github.com/RMLio/RMLStreamer/issues/19), Internal [issue #108](https://gitlab.ilabt.imec.be/rml/proc/rml-streamer/-/issues/108)) +* In some cases not all triples maps were applied when joins (static-static and static-streams) are involved (fixed together with issue above). +* Writing to file when input streams are involved is now possible (See GitHub [issue #8](https://github.com/RMLio/RMLStreamer/issues/8), internal [issue #107](https://gitlab.ilabt.imec.be/rml/proc/rml-streamer/-/issues/107)). +* XML/XPath handling was erroneous (See GitHub [issue #24](https://github.com/RMLio/RMLStreamer/issues/24), internal [issue #124](https://github.com/RMLio/RMLStreamer/issues/24)). + ## [2.0.0] - 2020-06-08 ### Changed @@ -102,4 +122,5 @@ can be set with the program argument `--baseIRI`. [1.2.1]: https://github.com/RMLio/RMLStreamer/compare/v1.2.0...v1.2.1 [1.2.2]: https://github.com/RMLio/RMLStreamer/compare/v1.2.1...v1.2.2 [1.2.3]: https://github.com/RMLio/RMLStreamer/compare/v1.2.2...v1.2.3 -[2.0.0]: https://github.com/RMLio/RMLStreamer/compare/v1.2.3...v2.0.0 \ No newline at end of file +[2.0.0]: https://github.com/RMLio/RMLStreamer/compare/v1.2.3...v2.0.0 +[2.1.0]: https://github.com/RMLio/RMLStreamer/compare/v2.0.0...v2.1.0 diff --git a/README.md b/README.md index aa6842f4..2f7cde90 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,8 @@ The RMLStreamer generates [RDF](https://www.w3.org/2001/sw/wiki/RDF) from files using [RML](http://rml.io/). The difference with other RML implementations is that it can handle *big* input files and *continuous data streams*, like sensor data. +Documentation regarding the use of (custom) functions can be found [here](documentation/README_Functions.md). + ### Quick start If you want to get the RMLStreamer up and running within 5 minutes using Docker, check out [docker/README.md](docker/README.md) @@ -13,15 +15,15 @@ If you want to deploy it yourself, read on. ### Installing Flink RMLStreamer runs its jobs on Flink clusters. -More information on how to install Flink and getting started can be found [here](https://ci.apache.org/projects/flink/flink-docs-release-1.10/getting-started/tutorials/local_setup.html). +More information on how to install Flink and getting started can be found [here](https://ci.apache.org/projects/flink/flink-docs-release-1.11/try-flink/local_installation.html). At least a local cluster must be running in order to start executing RML Mappings with RMLStreamer. -Please note that this version works with Flink 1.10.0 with Scala 2.11 support, which can be downloaded [here](https://www.apache.org/dyn/closer.lua/flink/flink-1.10.0/). +Please note that this version works with Flink 1.11.3 with Scala 2.11 support, which can be downloaded [here](https://archive.apache.org/dist/flink/flink-1.11.3/flink-1.11.3-bin-scala_2.11.tgz). ### Building RMLStreamer In order to build a jar file that can be deployed on a Flink cluster, you need: -- a Java JDK 8 or higher -- Apache Maven 3 or higher +- a Java JDK >= 11 and <= 13 (We develop and test on JDK 11) +- Apache Maven 3 or higher Clone or download and then build the code in this repository: @@ -44,7 +46,7 @@ The resulting `RMLStreamer-.jar`, found in the `target` folder, can be ### Executing RML Mappings Here we give examples for running RMLStreamer from the command line. We use `FLINK_BIN` to denote the Flink CLI tool, -usually found in the `bin` directory of the Flink installation. E.g. `/home/myuser/flink-1.10.0/bin/flink`. +usually found in the `bin` directory of the Flink installation. E.g. `/home/myuser/flink-1.11.3/bin/flink`. For Windows a `flink.bat` script is provided. The general usage is: @@ -74,7 +76,7 @@ $FLINK_BIN run toKafka --broker-list --top #### Complete RMLStreamer usage: ``` -Usage: RMLStreamer [toFile|toKafka|toTCPSocket] [options] +Usage: RMLStreamer [toFile|toKafka|toTCPSocket|noOutput] [options] -j, --job-name The name to assign to the job on the Flink cluster. Put some semantics in here ;) @@ -85,13 +87,14 @@ Usage: RMLStreamer [toFile|toKafka|toTCPSocket] [options] -m, --mapping-file REQUIRED. The path to an RML mapping file. The path must be accessible on the Flink cluster. --json-ld Write the output as JSON-LD instead of N-Quads. An object contains all RDF generated from one input record. Note: this is slower than using the default N-Quads format. - --bulk Write all triples generated from one input record at once. + --bulk Write all triples generated from one input record at once, instead of writing triples the moment they are generated. --checkpoint-interval