This page provides documentation on how to use the FADI big data framework using a sample use case: monitoring CETIC offices building.
- 1. Install FADI
- 2. Ingest and store measurements
- 3. Display dashboards and configure alerts
- 4. Explore
- 5. Process
- 6. Summary
In this simple example, we will ingest temperature measurements from sensors, store them and display them in a simple dashboard.
To install the FADI framework on your workstation or on a cloud, see the installation instructions.
The components needed for this use case are the following:
- Apache Nifi as a integration tool to ingest the sensor data from the data source (a csv file in this case) and store it in the database
- PostgreSQL as both a datawarehouse and datalake
- Gafana as a dashboard tool to display graphs from the data ingested and stored in the datalake
Those components are configured in the following sample config file, once the platform is ready, you can start working with it. The following instructions assume that you deployed FADI on your workstation inside minikube.
"An easy to use, powerful, and reliable system to process and distribute data."
Apache Nifi provides ingestion mechanism (to e.g. connect a database, REST API, csv/json/avro files on a FTP, ... for ingestion): in this case we want to read the temperature sensors data from our HVAC system and store it in a database.
Temperature measurements from the last 5 days (see HVAC sample temperatures csv extract) are ingested:
measure_ts,temperature
2019-06-23 14:05:03.503,22.5
2019-06-23 14:05:33.504,22.5
2019-06-23 14:06:03.504,22.5
2019-06-23 14:06:33.504,22.5
2019-06-23 14:07:03.504,22.5
2019-06-23 14:07:33.503,22.5
2019-06-23 14:08:03.504,22.5
2019-06-23 14:08:33.504,22.5
2019-06-23 14:09:03.503,22.5
2019-06-23 14:09:33.503,22.5
2019-06-23 14:10:03.503,22.5
2019-06-23 14:10:33.504,22.5
2019-06-23 14:11:03.503,22.5
2019-06-23 14:11:33.503,22.5
2019-06-23 14:12:03.503,22.5
2019-06-23 14:12:33.504,22.5
2019-06-23 14:13:03.504,22.5
2019-06-23 14:13:33.504,22.5
2019-06-23 14:14:03.504,22.5
(...)
First, setup the datalake by creating a table in the postgresql database.
Head to the pgadmin interface (http://pgadmin.fadi.minikube) and execute the table creation script.
(the default credentials are [email protected]
/admin
):
Get the database password:
export POSTGRES_PASSWORD=$(kubectl get secret --namespace fadi fadi-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)
echo $POSTGRES_PASSWORD
Then head to the Nifi web interface (http://nifi.fadi.minikube), if you are using the local installation with Minikube).
Now we need to tell Nifi to read the csv file and store the measurements in the data lake:
- InvokeHTTP processor:
- Settings >
Automatically Terminate Relationships
: all exceptResponse
- Properties > Remote url:
https://raw.githubusercontent.com/cetic/fadi/master/examples/basic/sample_data.csv
- Settings >
- PutDatabaseRecord processor:
- Settings >
Automatically Terminate Relationships
: all - Properties > Record Reader:
CSV Reader
- Properties > Database Connection Pooling Service > DBCPConnectionPool
- Database Connection URL:
jdbc:postgresql://fadi-postgresql:5432/postgres?stringtype=unspecified
- Database Driver Class Name:
org.postgresql.Driver
- Database Driver Location(s):
/opt/nifi/postgresql-42.2.6.jar
- Database User:
postgres
- Password: set to the postgresql password obtained above
- Database Connection URL:
- Settings >
See also the nifi template that corresponds to this example.
For more information on how to use Apache Nifi, see the official Nifi user guide and this Awesome Nifi resources.
Once the measurements are stored in the database, we will want to display the results in a dashboard.
"Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture."
Grafana provides a dashboard and alerting interface.
Head to the Grafana interface at http://grafana.fadi.minikube (the default credentials are admin
/password1
):
First we will define the postgresql datasource:
- host: fadi-postgresql:5432
- database: postgres
- user: postgres
- password: set to the postgresql password obtained above
- disable ssl
Then we will configure a simple dashboard that shows the temperatures over the last week:
And finally we will configure some alerts using very simple rules:
For more information on how to use Grafana, see the official Grafana user guide
"BI tool with a simple interface, feature-rich when it comes to views, that allows the user to create and share dashboards. This tool is simple and doesn’t require programming, and allows the user to explore, filter and organise data."
Apache Superset provides some interesting features to explore your data and build basic dashboards.
Head to the Superset interface at http://superset.fadi.minikube (the default credentials are admin
/password1
):
First we will define the datasource:
- SQLAlchemy URI:
postgresql://postgres:<your_password>@fadi-postgresql:5432/postgres
- Sources > Tables > "+"
- Database: example_basic
- Edit measure_ts
- Select Sources --> Tables from the top-level menu.
- Click on the "Edit" icon for the example_basic table.
- Click on the "List Columns" tab.
- Scroll down to the "measure_ts" column.
- Click on the "Edit" icon for the "date" column.
- In the "Expression" box, enter
measure_ts ::timestamptz
.
Then we will explore our data and build a simple dashboard with the data that is inside the database:
For more information on how to use Superset, see the official Superset user guide
"Apache Spark™ is a unified analytics engine for large-scale data processing."
Jupyter notebooks provide an easy interface to the Spark processing engine that runs on your cluster.
In this simple use case, we will try to access the data that is stored in the data lake.
Head to the Jupyter notebook interface at http://jupyterhub.fadi.minikube (the default credentials are admin
/password1
):
Do some data exploration in the notebook, load the sample code:
Do some Spark processing in the notebook, load the sample code:
For more information on how to use Superset, see the official Jupyter documentation
In this use case, we have demonstrated a simple configuration for FADI, where we use various services to ingest, store, analyse, explore and provide dashboards and alerts
You can find the various resources for this sample use case (Nifi flowfile, Grafana dashboards, ...) in the examples folder