Contoso is a company keeping track of the pets and is running real time analytics on top of the walks data it collects.
Each owner (Owner
) has multiple dogs (Dog
).
Each dog is being taken for a walk, 3 times a day, by a pet sitter (PetSitter
). When the walk is done a message (DogWalk
) is emitted which is processed by the real time streaming platform.
The overall architecture is the following:
-
A console application named EventGenerator is emitting events to event hubs.
-
These events are processed by a stream analytics job.
2a. Cosmos DB can be used to ensure exact once message process.
-
The
Dog
,Owner
andPetSitter
reference data are read from a blob storage. Note that it's a best practice to store the reference data with the date pattern in their path in order to support potential updates. -
The console application emits events with unknown
DogId
andPetSitterId
. These records are routed to a blob storage output. -
The successfully augmented data are stored in a different blob storage.
-
The data are streamed into a Power BI streaming dataset which is feeding a streaming dashboard.
-
Stream analytics can potentially output in another Event Hubs which can then trigger additional processing either through a coded approach in Azure Functions or a designer approach hosted in Logic Apps.
There are two approaches to deploy the necessary resources.
After deploying, upload the reference data in the corresponding storage container.
Open a powershell, navigate to the Deployment folder and login in azure
Connect-AzAccount -Subscription $SubscriptionId -Tenant $TenantId
After that you can deploy the resources and upload the reference data by running the following command and providing the Resource Group name where you want to deploy the resources and the demo name which will act as a prefix for the resources you will deploy.
NOTE: DemoName should comply with storage account names, thus it can include numbers and lowercase letters only.
.\deploy.ps1
Supply values for the following parameters:
ResourceGroupName:
DemoName:
NOTE: To execute not signed powershell scripts in your host you may have to disable signature checking
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
NOTE: This scripts also generates the
secrets.json
file needed in the EventGenerator project folder.
Normally you would like to automate the deployment of Azure Stream Analytics job something that is feasible with the following steps:
- Install Stream Analytics tools for Visual Studio
- Create a Visual Studio Stream Analytics Project.
- Use MSBuild to build the project and generate ARM template
- Deploy template using AzResourceGroupDeployment.
For this demo, you will have to configure the job manually, something which will give you the opportunity to familiarize with Azure Stream Analytics.
In the portal, open the Stream Analytics job and navigate to Outputs
(1). Select the output-pbi
output (2) and click on the Renew authorization
(3) to authorize the Azure Stream Analytics job to push data to the streaming dataset named demo-streaming-dataset
.
After authorizing you will be able to configure the target group workspace (4) where you want to deploy the streaming dataset.
Click Save
(5) to persist your changes.
In the portal, open the Stream Analytics job and navigate to Query
(1). Paste the following query in the query editor (2) and save the query (3).
With JoinedData As (
SELECT Input.Id, Input.WalkDurationInMinutes, Input.PetSitterId,
Dogs.Name as PetName, Dogs.Height, Dogs.Weight, Dogs.Length, Dogs.OwnerId,
UDF.FormatNumber(Dogs.OwnerId, 5) As PartitionKey,
PetSitters.FirstName + ' ' + PetSitters.LastName as PetSitterName, PetSitters.BirthDay as PetSitterBirthday, PetSitters.Rating, PetSitters.AverageWalkTimeInMinutes,
Owners.FirstName + ' ' + Owners.LastName as OwnerName, Owners.BirthDay as OwnerBirthday
FROM [input-event-hub] Input TIMESTAMP BY EventEnqueuedUtcTime
Left Outer JOIN [ref-data-dogs] Dogs ON Input.DogId = Dogs.Id
Left Outer JOIN [ref-data-petsitters] PetSitters On Input.PetSitterId = PetSitters.Id
Left Outer JOIN [ref-data-owners] Owners ON Dogs.OwnerId = Owners.Id
),
FullyEnrichedData As (
-- Discard all records that are partially matched to the reference data
Select * from JoinedData where OwnerId IS Not Null and PetSitterName is not Null and OwnerName is not Null
),
PartiallyEnrichedData As(
Select * from JoinedData where OwnerId IS Null or PetSitterName is Null or OwnerName is not Null
)
-- Store in permanent store and output in powerBI
Select * Into [output-permanent-store-enriched] From FullyEnrichedData;
Select * Into [output-pbi] From FullyEnrichedData;
-- Store the incomplete data to the missing store in order to investigate later
Select * Into [output-permanent-store-missing] From PartiallyEnrichedData
For common query patterns in Azure Stream Analytics, refer to this page.
Having all infrastructure deployed and the stream analytics job configured, it's time to run the demo.
In the portal, open the Stream Analytics job and click Start
, ensure that Now
is selected in the job output start time and click the Start
button.
Create a file named secrets.json
in the EventGenerator project folder. The file should have the following format:
{
"EventHubName": "name",
"ConnectionString": "Endpoint=sb://name.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=blablabla"
}
where you can retrieve the connection string following these instructions.
NOTE: If you used deploy method b via powershell, the
secrets.json
is automatically generated for you.
Build an run the EventGenerator
console application.
In the storage account you should be able to see in the dog-walks
container two folders:
MissingRefs
contains the records where either the owner or the dog or the pet sitter was not found in the reference dataOwners
contains the enhanced records partitioned by the ownerId.
All records are stored with date in their path {yyyy}/{mm}/{dd}
in order to be able to retrieve faster the records by date.
In PowerBI you should be able to see a streaming dataset created.
You can create a dashboard following the instructions of this article.
- Data generated using https://github.com/bchavez/Bogus.
- Diagram designed using Book of Architecture resources.