Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for AMBA for AVS #139

Merged
merged 1 commit into from
Mar 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions docs/content/patterns/specialized/avs/FAQ.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: Frequently Asked Questions
geekdocCollapseSection: true
weight: 80
---

> ## Do I need to use the thresholds defined as default values in the metric rule alerts?
>
> It's provided as a starting point, we've based the initial thresholds on what we've seen and what Microsoft's documentation recommends. You will need to adjust the thresholds at some point.
> You will need to observe and if the alert is too chatty, adjust the threshold up; if it's not alerting when there's a problem, adjust the threshold down a bit, (or vice-versa depending on what metric or log error is being used as a monitoring source). Once you have decided upon an appropriate value, if you feel it's fit for more general consumption we would love to hear about it.

> ## Do I need to use these metrics or can they be replaced with ones more suited to my environment?
>
> The metric rules we've created are based on recommendations from Microsoft documentation and field experience. How you're using Azure resources may also be different so tailor the alerts to suit your needs. The main goal of this project is to help you have a way to do Azure Monitor alerts at scale, create new rules with your own thresholds. We'd love to hear about your new rules too so feel free to share back.

> ## How much does it cost to run the ALZ Baseline solution?
>
> This depends on numerous factors including how many of the alert rules you choose to deploy into your environment, this combined with how many subscriptions inherit the baseline policies and resources deployed within each subscription that match the policy rules triggering an alert rule and action group deployment influence the cost.
>
> The solution is comprised of alert rules. Each alert rule costs ~0.1$/month<sup>1</sup>.
>
> - Alert rules are charged based on evaluations.
> - Assuming the alert rule had data to evaluate all throughout the month, it'll cost ~0.1$<sup>1</sup>.
> - If the rule was only evaluating during parts of the month (e.g. because the monitored resource was down and didn't send telemetry), the customer would pay for the prorated amount of time the rule was performing evaluations.
> - Dynamic Threshold doubles the cost of the alert rule (~0.2$/month in total<sup>1</sup>)
> - Our solution configures an email address as part of the Action groups deployment (one per subscription) and these are charged at ~2$/month per 1,000 emails<sup>1</sup>.
>
> **Whilst it is not anticipated that the solution will incur significant costs, it is recommended that you assess costs as part of a deployment to a non-production environment to make sure you are clear on the costs incurred for your deployment**
>
> For costings related to your deployment please visit [Pricing - Azure Monitor](https://azure.microsoft.com/en-us/pricing/details/monitor/) and work with your local Microsoft account team to define a rough order of magnitude (RoM) costings
>
> <sup>1</sup> Depending on the region you deploy to their may be a small difference in the associated cost, the costs provided here are based on prices captured as of April 2023

> ## Can I use AMBA without a GitHub repository
>
> <p>Yes, as long as the ARM templates are publicly accessible. There are several linked templates in this solution which require to be publicly accessible. This is because when the top level ARM template is submitted to Azure Resource Manager, the linked templates are not automatically uploaded and therefore need to pulled in at deploy time from Azure. This means they must be referenced using a URL which can be accessed from Azure (e.g. via a public GitHub repository)</p>
> <p>An alternative is to use Template specs. Instead of maintaining your linked templates at an accessible endpoint, you can create a template spec that packages the main template and its linked templates into a single entity you can deploy. The template spec is a resource in your Azure subscription. It makes it easy to securely share the template with users in your organization. You use Azure role-based access control (Azure RBAC) to grant access to the template spec. This feature is currently in preview.</p>
>
> References:
> - [Template specs](https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/linked-templates?tabs=azure-powershell#template-specs)
> - [ARM Private deployment](https://github.com/Azure/ARM-private-deployment)
7 changes: 7 additions & 0 deletions docs/content/patterns/specialized/avs/Known-Issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: Known Issues
geekdocCollapseSection: true
weight: 100
---

## None at this time
75 changes: 75 additions & 0 deletions docs/content/patterns/specialized/avs/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
title: Azure VMware Solution
geekdocCollapseSection: true
---

## Overview

It is crucial to monitor the resource utilization in order to take timely action. This solution helps in setting up Azure Monitor alerts for Azure VMware Solution Private Cloud. Action owners will receive email notifications if utilization metrics exceeds set threshold.

**Current Version:**
v1.0.0 (Mar 4, 2024)

## Alerts Table

Table below shows the Alerts configured after the deployment.

| Name | Threshold(s) (Severity) | Signal Type | Frequency | # Alert Rules |
|-----------------------------------|-------------------------|--------------------|-----------------|---------------|
| CPU Usage per Cluster | 80 (2) | EffectiveCpuAverage| Every 5 minutes | 1 |
| Memory Usage per Cluster | 80 (2) | UsageAverage | Every 5 minutes | 1 |
| Storage Usage per Datastore | 70 (2) | DiskUsedPercentage | Every 5 minutes | 1 |
| Storage Usage per Datastore (Critical) | 75 (0) | DiskUsedPercentage | Every 5 minutes | 1 |
| Service Health Alerts | N/A | ServiceHealth | N/A | 1 |

## 📣Feedback 📣

Once you've had an opportunity to deploy the solution we'd love to hear from you! Click [here](https://aka.ms/alz/monitor/feedback) to leave your feedback.

If you have encountered a problem please file an issue in our GitHub repo [GitHub Issue](https://github.com/Azure/azure-monitor-baseline-alerts/issues).

## Deployment Guide

We have a [Deployment Guide](./deploy/deploy.md) available for guidance on how to consume the contents of this repo.

## Known Issues

Please see the [Known Issues](Known-Issues).

## Frequently Asked Questions

Please see the [Frequently Asked Questions](../avs/FAQ.md).

## Contributing

This project welcomes contributions and suggestions.
Most contributions require you to agree to a Contributor License Agreement (CLA)
declaring that you have the right to, and actually do, grant us the rights to use your contribution.
For details, visit [https://cla.opensource.microsoft.com](https://cla.opensource.microsoft.com).

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment).
Simply follow the instructions provided by the bot.
You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [[email protected]](mailto:[email protected]) with any additional questions or comments.

{{< hint type=note >}}
Details on contributing to this repo can be found [here](../../../contributing)
{{< /hint >}}

## Telemetry

When you deploy the IP located in this repo, Microsoft can identify the installation of said IP with the deployed Azure resources. Microsoft can correlate these resources used to support the software. Microsoft collects this information to provide the best experiences with their products and to operate their business. The telemetry is collected through customer usage attribution. The data is collected and governed by [Microsoft's privacy policies](https://www.microsoft.com/trustcenter).

If you don't wish to send usage data to Microsoft, or need to understand more about its' use details can be found [here](./Telemetry).

## Trademarks

This project may contain trademarks or logos for projects, products, or services.
Authorized use of Microsoft trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
11 changes: 11 additions & 0 deletions docs/content/patterns/specialized/avs/deploy/deploy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: Deploying Azure VMware Solution Alerts
geekdocCollapseSection: true
weight: 50
---

## Deployment Guide

Follow the deployment guide available below.

[Configure AVS Utilization Alerts](https://github.com/Azure/Enterprise-Scale-for-AVS/tree/well-architected/BrownField/Monitoring/AVS-Utilization-Alerts)
219 changes: 219 additions & 0 deletions patterns/avs/avsArm.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.14.46.61228",
"templateHash": "11509858991434574809"
}
},
"parameters": {
"ActionGroupName": {
"type": "string",
"defaultValue": "AVSAlerts",
"metadata": {
"description": "Name of the action group to be created"
}
},
"AlertPrefix": {
"type": "string",
"defaultValue": "AVSAlert",
"metadata": {
"description": "Prefix to use for alert creation"
}
},
"ActionGroupEmails": {
"type": "array",
"defaultValue": [],
"metadata": {
"description": "Email addresses to be added to the action group. Use the format [\"[email protected]\",\"[email protected]\"]."
}
},
"PrivateCloudResourceId": {
"type": "string",
"metadata": {
"description": "The existing Private Cloud full resource id"
}
}
},
"variables": {
"varCuaid": "6f7b68e9-1179-4853-9dfe-1a4f793b9893",
"Alerts": [
{
"Name": "CPU",
"Description": "CPU Usage per Cluster",
"Metric": "EffectiveCpuAverage",
"SplitDimension": "clustername",
"Threshold": 80,
"Severity": 2
},
{
"Name": "Memory",
"Description": "Memory Usage per Cluster",
"Metric": "UsageAverage",
"SplitDimension": "clustername",
"Threshold": 80,
"Severity": 2
},
{
"Name": "Storage",
"Description": "Storage Usage per Datastore",
"Metric": "DiskUsedPercentage",
"SplitDimension": "dsname",
"Threshold": 70,
"Severity": 2
},
{
"Name": "StorageCritical",
"Description": "Storage Usage per Datastore",
"Metric": "DiskUsedPercentage",
"SplitDimension": "dsname",
"Threshold": 75,
"Severity": 0
}
]
},
"resources": [
{
"type": "microsoft.insights/actionGroups",
"apiVersion": "2019-06-01",
"name": "[parameters('ActionGroupName')]",
"location": "Global",
"properties": {
"copy": [
{
"name": "emailReceivers",
"count": "[length(parameters('ActionGroupEmails'))]",
"input": {
"emailAddress": "[parameters('ActionGroupEmails')[copyIndex('emailReceivers')]]",
"name": "[split(parameters('ActionGroupEmails')[copyIndex('emailReceivers')], '@')[0]]",
"useCommonAlertSchema": false
}
}
],
"enabled": true,
"groupShortName": "[substring(format('avs{0}', uniqueString(parameters('ActionGroupName'))), 0, 12)]"
}
},
{
"type": "Microsoft.Insights/activityLogAlerts",
"apiVersion": "2020-10-01",
"name": "[format('{0}-ServiceHealth', parameters('AlertPrefix'))]",
"location": "Global",
"properties": {
"description": "Service Health Alerts",
"condition": {
"allOf": [
{
"field": "category",
"equals": "ServiceHealth"
},
{
"field": "properties.impactedServices[*].ServiceName",
"containsAny": [
"Azure VMware Solution"
]
},
{
"field": "properties.impactedServices[*].ImpactedRegions[*].RegionName",
"containsAny": [
"[reference(parameters('PrivateCloudResourceId'), '2021-06-01', 'Full').location]",
"Global"
]
}
]
},
"scopes": [
"[subscription().id]"
],
"enabled": true,
"actions": {
"actionGroups": [
{
"actionGroupId": "[resourceId('microsoft.insights/actionGroups', parameters('ActionGroupName'))]"
}
]
}
},
"dependsOn": [
"[resourceId('microsoft.insights/actionGroups', parameters('ActionGroupName'))]"
]
},
{
"copy": {
"name": "MetricAlert",
"count": "[length(variables('Alerts'))]"
},
"type": "Microsoft.Insights/metricAlerts",
"apiVersion": "2018-03-01",
"name": "[format('{0}-{1}', parameters('AlertPrefix'), variables('Alerts')[copyIndex()].Name)]",
"location": "Global",
"properties": {
"description": "[variables('Alerts')[copyIndex()].Description]",
"criteria": {
"odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria",
"allOf": [
{
"name": "Metric1",
"operator": "GreaterThan",
"threshold": "[variables('Alerts')[copyIndex()].Threshold]",
"timeAggregation": "Average",
"criterionType": "StaticThresholdCriterion",
"metricName": "[variables('Alerts')[copyIndex()].Metric]",
"dimensions": [
{
"name": "[variables('Alerts')[copyIndex()].SplitDimension]",
"operator": "Include",
"values": [
"*"
]
}
]
}
]
},
"scopes": [
"[parameters('PrivateCloudResourceId')]"
],
"severity": "[variables('Alerts')[copyIndex()].Severity]",
"evaluationFrequency": "PT5M",
"windowSize": "PT30M",
"autoMitigate": true,
"enabled": true,
"actions": [
{
"actionGroupId": "[resourceId('microsoft.insights/actionGroups', parameters('ActionGroupName'))]"
}
]
},
"dependsOn": [
"[resourceId('microsoft.insights/actionGroups', parameters('ActionGroupName'))]"
]
},
{
"type": "Microsoft.Resources/deployments",
"apiVersion": "2020-10-01",
"name": "[format('pid-{0}-{1}', variables('varCuaid'), uniqueString(resourceGroup().location))]",
"properties": {
"expressionEvaluationOptions": {
"scope": "inner"
},
"mode": "Incremental",
"parameters": {},
"template": {
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.14.46.61228",
"templateHash": "8359988288953583068"
}
},
"resources": []
}
}
}
]
}
14 changes: 14 additions & 0 deletions patterns/avs/avsArm.param.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"ActionGroupEmails": {
"value": [
"[email protected]"
]
},
"PrivateCloudResourceId": {
"value": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/ExampleRG/providers/Microsoft.AVS/privateClouds/ExamplePrivateCloud"
}
}
}