Skip to content

Commit

Permalink
Merge pull request #133 from rtdip/develop
Browse files Browse the repository at this point in the history
v0.2.0
  • Loading branch information
GBBBAS authored Apr 3, 2023
2 parents d4b8d1c + 8997747 commit 2b63384
Show file tree
Hide file tree
Showing 52 changed files with 1,160 additions and 399 deletions.
6 changes: 4 additions & 2 deletions PYPI-README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@

# What is the RTDIP SDK?

​​**Real Time Data Ingestion Platform (RTDIP) SDK** is a software development kit built to easily access some of RTDIP's transformation functions.
​​**Real Time Data Ingestion Platform (RTDIP) SDK** is a python software development kit built to provide users, data scientists and developers with the ability to interact with components of the Real Time Data Ingestion Platform, including:

The RTDIP SDK will give the end user the power to use some of the convenience methods for frequency conversions and resampling of Pi data all through a self-service platform. RTDIP is offering a flexible product with the ability to authenticate and connect to Databricks SQL Warehouses given the end users preferences. RTDIP have taken the initiative to cut out the middle man and instead wrap these commonly requested methods in a simple python module so that you can instead focus on the data.
- Building, Executing and Deploying Ingestion Pipelines
- Execution of queries on RTDIP data
- Authentication to securely interact with environments and data

See [RTDIP Documentation](https://www.rtdip.io/) for more information on how to use the SDK.

Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@ This repository contains Real Time Data Ingestion Platform SDK functions and doc

## What is RTDIP SDK?

​​**Real Time Data Ingestion Platform (RTDIP) SDK** is a software development kit built to easily access some of RTDIP's transformation functions.
​​**Real Time Data Ingestion Platform (RTDIP) SDK** is a python software development kit built to provide users, data scientists and developers with the ability to interact with components of the Real Time Data Ingestion Platform, including:

The RTDIP SDK will give the end user the power to use some of the convenience methods for frequency conversions and resampling of Pi data all through a self-service platform. RTDIP is offering a flexible product with the ability to authenticate and connect to Databricks SQL Warehouses given the end users preferences. RTDIP have taken the initiative to cut out the middle man and instead wrap these commonly requested methods in a simple python module so that you can instead focus on the data.
- Building, Executing and Deploying Ingestion Pipelines
- Execution of queries on RTDIP data
- Authentication to securely interact with environments and data

See [RTDIP Documentation](https://www.rtdip.io/) for more information on how to use the SDK.

Expand Down
2 changes: 1 addition & 1 deletion docs/blog/rtdip_ingestion_pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ Edge components are designed to provide a lightweight, low latency, low resource

|Edge Type|Azure IoT Edge|AWS Greengrass|Target|
|---------|--------------|--------------|------|
| OPC Publisher|:heavy_check_mark:||Q3-Q4 2023|
| OPC CloudPublisher|:heavy_check_mark:||Q3-Q4 2023|
| Greengrass OPC UA||:heavy_check_mark:|Q4 2023|

## Conclusion
Expand Down
31 changes: 30 additions & 1 deletion docs/getting-started/about-rtdip.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# About RTDIP

![timeseries](images/rtdip-horizontal-color.png){width=100%}
## RTDIP and LF Energy

<p align="center"><img src=https://raw.githubusercontent.com/rtdip/core/develop/docs/getting-started/images/rtdip-horizontal-color.png alt="rtdip" width=50% height=50%/></p>

By providing frameworks and reference implementations, [LF Energy](https://www.lfenergy.org/) minimizes pain points such as cybersecurity, interoperability, control, automation, virtualization, and the orchestration of supply and demand.

[RTDIP](https://www.lfenergy.org/projects/real-time-data-ingestion-platform-rtdip/) is an LF Energy project and forms part of an overall open source energy ecosystem. To find out more about projects in LF Energy, please click [here.](https://www.lfenergy.org/projects/)

## What is Real Time Data Ingestion Platform

Expand All @@ -9,3 +15,26 @@ Organizations need data for day-to-day operations and to support advanced Data S
Real time data enables organizations to detect and respond to changes in their systems thus improving the efficiency of their operations. This data needs to be available in scalable and secure data platforms.

**Real Time Data Ingestion Platform (RTDIP)** is the solution of choice leveraging **PaaS** (Platform as a Service) services along with some custom components to provide Data Ingestion, Data Transformation, and Data Sharing as a platform. RTDIP can interface with several data sources to ingest many different data types which include time series, alarms, video, photos and audio being provided from sources such as Historians, OPC Servers and Rest APIs, as well as data being sent from hardware such as IoT Sensors, Robots and Drones.



## ​Why Real Time Data Ingestion Platform?

**Real Time Data Ingestion Platform (RTDIP)** enables users to consume **Real Time Data** at scale, including historical and real time streaming data. **RTDIP** has proven to be capable of ingesting over 3 million sensors in a production environment across every geographical location in the world.

The Real Time Data Ingestion Platform can be run in a customers own environment, allowing them to accelerate their cloud deployments while leveraging a proven design to scale their time series data needs.

RTDIP also provides a number popular integration options, including:

1. ODBC
1. JDBC
1. Rest API
1. Python SDK

These options allow users to integrate with a wide variety of applications and tools, including:

1. Data Visualization Tools such as ***Power BI, Seeq, Tableau, and Grafana***
1. Data Science Tools such as ***Jupyter Notebooks, R Studio, and Python***
1. Data Engineering Tools such as ***Apache Spark, Apache Kafka, and Apache Airflow***

RTDIP is architected to leverage Open Source technologies [Apache Spark](https://spark.apache.org/) and [Delta](https://delta.io/). This allows users to leverage the power of Open Source technologies to build their own custom applications and tools in whichever environment they prefer, whether that is in the cloud or on-premise on their own managed Spark Clusters.
184 changes: 150 additions & 34 deletions docs/getting-started/installation.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,187 @@
# Getting started with the RTDIP SDK
# Getting started with RTDIP

<p align="center"><img src=https://raw.githubusercontent.com/rtdip/core/develop/docs/getting-started/images/rtdip-horizontal-color.png alt="rtdip" width=50% height=50%/></p>

RTDIP provides functionality to process and query real time data. The RTDIP SDK is central to building pipelines and querying data, so getting started with it is key to unlocking the capability of RTDIP.

This article provides a guide on how to install the RTDIP SDK. Get started by ensuring you have all the prerequisites before following the simple installation steps.

* [Prerequisites](#prerequisites)

* [Installing](#installing-the-rtdip-sdk)
* [Installation](#installing-the-rtdip-sdk)

## Prerequisites

### Python

There are a few things to note before using the RTDIP SDK. The following prerequisites will need to be installed on your local machine.

* Python version 3.8 >=, < 4.0 should be installed. Check which python version you have with the following command:
Python version 3.8 >= and < 3.11 should be installed. Check which python version you have with the following command:

python --version
python --version

Find the latest python version [here](https://www.python.org/downloads/) and ensure your python path is set up correctly on your machine.
Find the latest python version [here](https://www.python.org/downloads/) and ensure your python path is set up correctly on your machine.

* Ensure your pip python version matches the python version on your machine. Check which version of pip you have installed with the following command:

### Python Package Installers

Installing the RTDIP can be done using a package installer, such as [Pip](https://pip.pypa.io/en/stable/getting-started/), [Conda](https://docs.conda.io/en/latest/) or [Micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html).

=== "Pip"
Ensure your pip python version matches the python version on your machine. Check which version of pip you have installed with the following command:

pip --version

There are two ways to ensure you have the correct versions installed. Either upgrade your Python and pip install or create an environment.

**Option 1**: To upgrade your version of pip use the following command:

python -m pip install --upgrade pip

**OR**
=== "Conda"

**Option 2**: To create an environment, you will need to create a **environment.yml** file with the following:
Check which version of Conda is installed with the following command:

conda --version

name: rtdip-sdk
channels:
- conda-forge
- defaults
dependencies:
- python==3.11
- pip==22.0.2
- pip:
- rtdip-sdk
If necessary, upgrade Conda as follows:
conda update conda

Run the following command:
=== "Micromamba"

conda env create -f environment.yml
Check which version of Micromamba is installed with the following command:
micromamba --version

To update an environment previously created:
If necessary, upgrade Micromamba as follows:
micromamba self-update

conda env update -f environment.yml
### ODBC
To use pyodbc or turbodbc python libraries, ensure that the required ODBC driver is installed as per these [instructions](https://docs.microsoft.com/en-us/azure/databricks/integrations/bi/jdbc-odbc-bi#download-the-odbc-driver).

* To use pyodbc or turbodbc python libraries, ensure that the required ODBC driver is installed as per these [instructions](https://docs.microsoft.com/en-us/azure/databricks/integrations/bi/jdbc-odbc-bi#download-the-odbc-driver)
#### Pyodbc
If you plan to use pyodbc, Microsoft Visual C++ 14.0 or greater is required. Get it from [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/).

* If you plan to use pyodbc, Microsoft Visual C++ 14.0 or greater is required. Get it from [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
#### Turbodbc
To use turbodbc python library, ensure to follow the [Turbodbc Getting Started](https://turbodbc.readthedocs.io/en/latest/pages/getting_started.html) section and ensure that [Boost](https://turbodbc.readthedocs.io/en/latest/pages/getting_started.html) is installed correctly.

* To use turbodbc python library, ensure that [Boost](https://turbodbc.readthedocs.io/en/latest/pages/getting_started.html) is installed correctly.
### Java
If you are planning to use the RTDIP Pipelines in your own environment that leverages [pyspark](https://spark.apache.org/docs/latest/api/python/getting_started/install.html) for a component, Java 8 or later is a [prerequisite](https://spark.apache.org/docs/latest/api/python/getting_started/install.html#dependencies). See below for suggestions to install Java in your development environment.

=== "Conda"
A fairly simple option is to use the conda **openjdk** package to install Java into your python virtual environment. An example of a conda **environment.yml** file to achieve this is below.

```yaml
name: rtdip-sdk
channels:
- conda-forge
- defaults
dependencies:
- python==3.10
- pip==23.0.1
- openjdk==11.0.15
- pip:
- rtdip-sdk
```

!!! note "Pypi"
This package is not available from Pypi.

=== "Java"
Follow the official Java JDK installation documentation [here.](https://docs.oracle.com/en/java/javase/11/install/overview-jdk-installation.html)

- [Windows](https://docs.oracle.com/en/java/javase/11/install/installation-jdk-microsoft-windows-platforms.html)
- [Mac OS](https://docs.oracle.com/en/java/javase/11/install/installation-jdk-macos.html)
- [Linux](https://docs.oracle.com/en/java/javase/11/install/installation-jdk-linux-platforms.html)

!!! note "Windows"
Windows requires an additional installation of a file called **winutils.exe**. Please see this [repo](https://github.com/steveloughran/winutils) for more information.

## Installing the RTDIP SDK

RTDIP SDK is a PyPi package that can be found [here](https://pypi.org/project/rtdip-sdk/). On this page you can find the **project description**, **release history**, **statistics**, **project links** and **maintainers**.

1\. To install the latest released version of RTDIP SDK from PyPi use the following command:
Features of the SDK can be installed using different extras statements when installing the **rtdip-sdk** package:

=== "Queries"
When installing the package for only quering data, simply specify in your preferred python package installer:

rtdip-sdk

=== "Pipelines"
RTDIP SDK can be installed to include the packages required to build, execute and deploy pipelines. Specify the following extra **[pipelines]** when installing RTDIP SDK so that the required python packages are included during installation.

rtdip-sdk[pipelines]

=== "Pipelines + Pyspark"
RTDIP SDK can also execute pyspark functions as a part of the pipelines functionality. Specify the following extra **[pipelines,pyspark]** when installing RTDIP SDK so that the required pyspark python packages are included during installation.

rtdip-sdk[pipelines,pyspark]

pip install rtdip-sdk
!!! note "Java"
Ensure that Java is installed prior to installing the rtdip-sdk with the **[pipelines,pyspark]**. See [here](#java) for more information.

If you have previously installed the RTDIP SDK and would like the latest version, see below.
The following provides examples of how to install the RTDIP SDK package with Pip, Conda or Micromamba. Please note the section above to update any extra packages to be installed as part of the RTDIP SDK.

=== "Pip"

To install the latest released version of RTDIP SDK from PyPi use the following command:

pip install rtdip-sdk

If you have previously installed the RTDIP SDK and would like the latest version, see below:

pip install rtdip-sdk --upgrade

=== "Conda"

To create an environment, you will need to create a **environment.yml** file with the following:

```yaml
name: rtdip-sdk
channels:
- conda-forge
- defaults
dependencies:
- python==3.10
- pip==23.0.1
- pip:
- rtdip-sdk
```

Run the following command:

conda env create -f environment.yml

To update an environment previously created:

conda env update -f environment.yml

=== "Micromamba"

To create an environment, you will need to create a **environment.yml** file with the following:

```yaml
name: rtdip-sdk
channels:
- conda-forge
- defaults
dependencies:
- python==3.10
- pip==23.0.1
- pip:
- rtdip-sdk
```

Run the following command:

micromamba create -f environment.yml

To update an environment previously created:

pip install rtdip-sdk --upgrade
micromamba update -f environment.yml

2\. Once the installation is complete you can learn how to use the SDK [here.](../sdk/rtdip-sdk-usage.md)
## Next steps
Once the installation is complete you can learn how to use the SDK [here.](../sdk/overview.md)

!!! note "Note"
</b>If you are having trouble installing the SDK, ensure you have installed all of the prerequisites. If problems persist please see [Troubleshooting](../sdk/troubleshooting.md) for more information. Please also reach out to the RTDIP team via Issues, we are always looking to improve the SDK and value your input.<br />
</b>If you are having trouble installing the SDK, ensure you have installed all of the prerequisites. If problems persist please see [Troubleshooting](../sdk/queries/databricks/troubleshooting.md) for more information. Please also reach out to the RTDIP team via Issues, we are always looking to improve the SDK and value your input.<br />
24 changes: 0 additions & 24 deletions docs/getting-started/why-rtdip.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ hide:
<div class="row">
<div class="col">
<a
href="getting-started/about-rtdip/"
href="getting-started/installation/"
title="Getting Started"
class="md-button"
>
Expand Down
2 changes: 1 addition & 1 deletion docs/integration/power-bi.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Microsoft Power BI is a business analytics service that provides interactive visualizations with self-service business intelligence capabilities
that enable end users to create reports and dashboards by themselves without having to depend on information technology staff or database administrators.

<center>![Power BI Databricks](images/databricks_powerbi.png){width=100%}</center>
<center>![Power BI Databricks](images/databricks_powerbi.png){width=50%}</center>

When you use Azure Databricks as a data source with Power BI, you can bring the advantages of Azure Databricks performance and technology beyond data scientists and data engineers to all business users.

Expand Down
Loading

0 comments on commit 2b63384

Please sign in to comment.