Skip to content

Latest commit

 

History

History
197 lines (127 loc) · 8.98 KB

USERGUIDE.md

File metadata and controls

197 lines (127 loc) · 8.98 KB

FADI User guide

This page provides documentation on how to use the FADI big data framework using a sample use case: monitoring CETIC offices building.

FADI sample use case - building monitoring

In this simple example, we will ingest temperature measurements from sensors, store them and display them in a simple dashboard.

1. Install FADI

To install the FADI framework on your workstation or on a cloud, see the installation instructions.

The components needed for this use case are the following:

  • Apache Nifi as a integration tool to ingest the sensor data from the data source (a csv file in this case) and store it in the database
  • PostgreSQL as both a datawarehouse and datalake
  • Gafana as a dashboard tool to display graphs from the data ingested and stored in the datalake

Those components are configured in the following sample config file, once the platform is ready, you can start working with it. The following instructions assume that you deployed FADI on your workstation inside minikube.

2. Ingest and store measurements

"An easy to use, powerful, and reliable system to process and distribute data."

Apache Nifi provides ingestion mechanism (to e.g. connect a database, REST API, csv/json/avro files on a FTP, ... for ingestion): in this case we want to read the temperature sensors data from our HVAC system and store it in a database.

Temperature measurements from the last 5 days (see HVAC sample temperatures csv extract) are ingested:

measure_ts,temperature
2019-06-23 14:05:03.503,22.5
2019-06-23 14:05:33.504,22.5
2019-06-23 14:06:03.504,22.5
2019-06-23 14:06:33.504,22.5
2019-06-23 14:07:03.504,22.5
2019-06-23 14:07:33.503,22.5
2019-06-23 14:08:03.504,22.5
2019-06-23 14:08:33.504,22.5
2019-06-23 14:09:03.503,22.5
2019-06-23 14:09:33.503,22.5
2019-06-23 14:10:03.503,22.5
2019-06-23 14:10:33.504,22.5
2019-06-23 14:11:03.503,22.5
2019-06-23 14:11:33.503,22.5
2019-06-23 14:12:03.503,22.5
2019-06-23 14:12:33.504,22.5
2019-06-23 14:13:03.504,22.5
2019-06-23 14:13:33.504,22.5
2019-06-23 14:14:03.504,22.5
(...)

First, setup the datalake by creating a table in the postgresql database.

Head to the pgadmin interface (http://pgadmin.fadi.minikube) and execute the table creation script.

(the default credentials are [email protected]/admin):

Get the database password:

export POSTGRES_PASSWORD=$(kubectl get secret --namespace fadi fadi-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)
echo $POSTGRES_PASSWORD

Then head to the Nifi web interface (http://nifi.fadi.minikube), if you are using the local installation with Minikube).

Nifi web interface

Now we need to tell Nifi to read the csv file and store the measurements in the data lake:

Nifi Ingest CSV and store in PostgreSQL

  • InvokeHTTP processor:
    • Settings > Automatically Terminate Relationships : all except Response
    • Properties > Remote url: https://raw.githubusercontent.com/cetic/fadi/master/examples/basic/sample_data.csv
  • PutDatabaseRecord processor:
    • Settings > Automatically Terminate Relationships : all
    • Properties > Record Reader: CSV Reader
    • Properties > Database Connection Pooling Service > DBCPConnectionPool
      • Database Connection URL: jdbc:postgresql://fadi-postgresql:5432/postgres?stringtype=unspecified
      • Database Driver Class Name: org.postgresql.Driver
      • Database Driver Location(s): /opt/nifi/postgresql-42.2.6.jar
      • Database User: postgres
      • Password: set to the postgresql password obtained above

See also the nifi template that corresponds to this example.

For more information on how to use Apache Nifi, see the official Nifi user guide and this Awesome Nifi resources.

3. Display dashboards and configure alerts

Once the measurements are stored in the database, we will want to display the results in a dashboard.

"Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture."

Grafana provides a dashboard and alerting interface.

Head to the Grafana interface at http://grafana.fadi.minikube (the default credentials are admin/password1):

Grafana web interface

First we will define the postgresql datasource:

Grafana datasource

  • host: fadi-postgresql:5432
  • database: postgres
  • user: postgres
  • password: set to the postgresql password obtained above
  • disable ssl

Then we will configure a simple dashboard that shows the temperatures over the last week:

Grafana dashboard

And finally we will configure some alerts using very simple rules:

Grafana alert

For more information on how to use Grafana, see the official Grafana user guide

4. Explore

"BI tool with a simple interface, feature-rich when it comes to views, that allows the user to create and share dashboards. This tool is simple and doesn’t require programming, and allows the user to explore, filter and organise data."

Apache Superset provides some interesting features to explore your data and build basic dashboards.

Head to the Superset interface at http://superset.fadi.minikube (the default credentials are admin/password1):

First we will define the datasource:

Superset datasource

  • SQLAlchemy URI: postgresql://postgres:<your_password>@fadi-postgresql:5432/postgres

Superset table

  • Sources > Tables > "+"
    • Database: example_basic
    • Edit measure_ts
      • Select Sources --> Tables from the top-level menu.
      • Click on the "Edit" icon for the example_basic table.
      • Click on the "List Columns" tab.
      • Scroll down to the "measure_ts" column.
      • Click on the "Edit" icon for the "date" column.
      • In the "Expression" box, enter measure_ts ::timestamptz.

Then we will explore our data and build a simple dashboard with the data that is inside the database:

Superset dashboard

For more information on how to use Superset, see the official Superset user guide

5. Process

"Apache Spark™ is a unified analytics engine for large-scale data processing."

Jupyter notebooks provide an easy interface to the Spark processing engine that runs on your cluster.

In this simple use case, we will try to access the data that is stored in the data lake.

Head to the Jupyter notebook interface at http://jupyterhub.fadi.minikube (the default credentials are admin/password1):

Jupyter web interface

Do some data exploration in the notebook, load the sample code:

Jupyter exploration

Do some Spark processing in the notebook, load the sample code:

Jupyter processing

For more information on how to use Superset, see the official Jupyter documentation

6. Summary

In this use case, we have demonstrated a simple configuration for FADI, where we use various services to ingest, store, analyse, explore and provide dashboards and alerts

You can find the various resources for this sample use case (Nifi flowfile, Grafana dashboards, ...) in the examples folder