diff --git a/docs/content/patterns/specialized/avs/FAQ.md b/docs/content/patterns/specialized/avs/FAQ.md new file mode 100644 index 000000000..1808582ae --- /dev/null +++ b/docs/content/patterns/specialized/avs/FAQ.md @@ -0,0 +1,41 @@ +--- +title: Frequently Asked Questions +geekdocCollapseSection: true +weight: 80 +--- + +> ## Do I need to use the thresholds defined as default values in the metric rule alerts? +> +> It's provided as a starting point, we've based the initial thresholds on what we've seen and what Microsoft's documentation recommends. You will need to adjust the thresholds at some point. +> You will need to observe and if the alert is too chatty, adjust the threshold up; if it's not alerting when there's a problem, adjust the threshold down a bit, (or vice-versa depending on what metric or log error is being used as a monitoring source). Once you have decided upon an appropriate value, if you feel it's fit for more general consumption we would love to hear about it. + +> ## Do I need to use these metrics or can they be replaced with ones more suited to my environment? +> +> The metric rules we've created are based on recommendations from Microsoft documentation and field experience. How you're using Azure resources may also be different so tailor the alerts to suit your needs. The main goal of this project is to help you have a way to do Azure Monitor alerts at scale, create new rules with your own thresholds. We'd love to hear about your new rules too so feel free to share back. + +> ## How much does it cost to run the ALZ Baseline solution? +> +> This depends on numerous factors including how many of the alert rules you choose to deploy into your environment, this combined with how many subscriptions inherit the baseline policies and resources deployed within each subscription that match the policy rules triggering an alert rule and action group deployment influence the cost. +> +> The solution is comprised of alert rules. Each alert rule costs ~0.1$/month1. +> +> - Alert rules are charged based on evaluations. +> - Assuming the alert rule had data to evaluate all throughout the month, it'll cost ~0.1$1. +> - If the rule was only evaluating during parts of the month (e.g. because the monitored resource was down and didn't send telemetry), the customer would pay for the prorated amount of time the rule was performing evaluations. +> - Dynamic Threshold doubles the cost of the alert rule (~0.2$/month in total1) +> - Our solution configures an email address as part of the Action groups deployment (one per subscription) and these are charged at ~2$/month per 1,000 emails1. +> +> **Whilst it is not anticipated that the solution will incur significant costs, it is recommended that you assess costs as part of a deployment to a non-production environment to make sure you are clear on the costs incurred for your deployment** +> +> For costings related to your deployment please visit [Pricing - Azure Monitor](https://azure.microsoft.com/en-us/pricing/details/monitor/) and work with your local Microsoft account team to define a rough order of magnitude (RoM) costings +> +> 1 Depending on the region you deploy to their may be a small difference in the associated cost, the costs provided here are based on prices captured as of April 2023 + +> ## Can I use AMBA without a GitHub repository +> +>
Yes, as long as the ARM templates are publicly accessible. There are several linked templates in this solution which require to be publicly accessible. This is because when the top level ARM template is submitted to Azure Resource Manager, the linked templates are not automatically uploaded and therefore need to pulled in at deploy time from Azure. This means they must be referenced using a URL which can be accessed from Azure (e.g. via a public GitHub repository)
+>An alternative is to use Template specs. Instead of maintaining your linked templates at an accessible endpoint, you can create a template spec that packages the main template and its linked templates into a single entity you can deploy. The template spec is a resource in your Azure subscription. It makes it easy to securely share the template with users in your organization. You use Azure role-based access control (Azure RBAC) to grant access to the template spec. This feature is currently in preview.
+> +> References: +> - [Template specs](https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/linked-templates?tabs=azure-powershell#template-specs) +> - [ARM Private deployment](https://github.com/Azure/ARM-private-deployment) diff --git a/docs/content/patterns/specialized/avs/Known-Issues.md b/docs/content/patterns/specialized/avs/Known-Issues.md new file mode 100644 index 000000000..f92774493 --- /dev/null +++ b/docs/content/patterns/specialized/avs/Known-Issues.md @@ -0,0 +1,7 @@ +--- +title: Known Issues +geekdocCollapseSection: true +weight: 100 +--- + +## None at this time diff --git a/docs/content/patterns/specialized/avs/_index.md b/docs/content/patterns/specialized/avs/_index.md new file mode 100644 index 000000000..565f19b79 --- /dev/null +++ b/docs/content/patterns/specialized/avs/_index.md @@ -0,0 +1,75 @@ +--- +title: Azure VMware Solution +geekdocCollapseSection: true +--- + +## Overview + +It is crucial to monitor the resource utilization in order to take timely action. This solution helps in setting up Azure Monitor alerts for Azure VMware Solution Private Cloud. Action owners will receive email notifications if utilization metrics exceeds set threshold. + +**Current Version:** +v1.0.0 (Mar 4, 2024) + +## Alerts Table + +Table below shows the Alerts configured after the deployment. + +| Name | Threshold(s) (Severity) | Signal Type | Frequency | # Alert Rules | +|-----------------------------------|-------------------------|--------------------|-----------------|---------------| +| CPU Usage per Cluster | 80 (2) | EffectiveCpuAverage| Every 5 minutes | 1 | +| Memory Usage per Cluster | 80 (2) | UsageAverage | Every 5 minutes | 1 | +| Storage Usage per Datastore | 70 (2) | DiskUsedPercentage | Every 5 minutes | 1 | +| Storage Usage per Datastore (Critical) | 75 (0) | DiskUsedPercentage | Every 5 minutes | 1 | +| Service Health Alerts | N/A | ServiceHealth | N/A | 1 | + +## 📣Feedback 📣 + +Once you've had an opportunity to deploy the solution we'd love to hear from you! Click [here](https://aka.ms/alz/monitor/feedback) to leave your feedback. + +If you have encountered a problem please file an issue in our GitHub repo [GitHub Issue](https://github.com/Azure/azure-monitor-baseline-alerts/issues). + +## Deployment Guide + +We have a [Deployment Guide](./deploy/deploy.md) available for guidance on how to consume the contents of this repo. + +## Known Issues + +Please see the [Known Issues](Known-Issues). + +## Frequently Asked Questions + +Please see the [Frequently Asked Questions](../avs/FAQ.md). + +## Contributing + +This project welcomes contributions and suggestions. +Most contributions require you to agree to a Contributor License Agreement (CLA) +declaring that you have the right to, and actually do, grant us the rights to use your contribution. +For details, visit [https://cla.opensource.microsoft.com](https://cla.opensource.microsoft.com). + +When you submit a pull request, a CLA bot will automatically determine whether you need to provide +a CLA and decorate the PR appropriately (e.g., status check, comment). +Simply follow the instructions provided by the bot. +You will only need to do this once across all repos using our CLA. + +This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). +For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or +contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. + +{{< hint type=note >}} +Details on contributing to this repo can be found [here](../../../contributing) +{{< /hint >}} + +## Telemetry + +When you deploy the IP located in this repo, Microsoft can identify the installation of said IP with the deployed Azure resources. Microsoft can correlate these resources used to support the software. Microsoft collects this information to provide the best experiences with their products and to operate their business. The telemetry is collected through customer usage attribution. The data is collected and governed by [Microsoft's privacy policies](https://www.microsoft.com/trustcenter). + +If you don't wish to send usage data to Microsoft, or need to understand more about its' use details can be found [here](./Telemetry). + +## Trademarks + +This project may contain trademarks or logos for projects, products, or services. +Authorized use of Microsoft trademarks or logos is subject to and must follow +[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/legal/intellectualproperty/trademarks/usage/general). +Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. +Any use of third-party trademarks or logos are subject to those third-party's policies. diff --git a/docs/content/patterns/specialized/avs/deploy/deploy.md b/docs/content/patterns/specialized/avs/deploy/deploy.md new file mode 100644 index 000000000..50ad874ff --- /dev/null +++ b/docs/content/patterns/specialized/avs/deploy/deploy.md @@ -0,0 +1,11 @@ +--- +title: Deploying Azure VMware Solution Alerts +geekdocCollapseSection: true +weight: 50 +--- + +## Deployment Guide + +Follow the deployment guide available below. + +[Configure AVS Utilization Alerts](https://github.com/Azure/Enterprise-Scale-for-AVS/tree/well-architected/BrownField/Monitoring/AVS-Utilization-Alerts) diff --git a/patterns/avs/avsArm.json b/patterns/avs/avsArm.json new file mode 100644 index 000000000..f2884346a --- /dev/null +++ b/patterns/avs/avsArm.json @@ -0,0 +1,219 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": { + "name": "bicep", + "version": "0.14.46.61228", + "templateHash": "11509858991434574809" + } + }, + "parameters": { + "ActionGroupName": { + "type": "string", + "defaultValue": "AVSAlerts", + "metadata": { + "description": "Name of the action group to be created" + } + }, + "AlertPrefix": { + "type": "string", + "defaultValue": "AVSAlert", + "metadata": { + "description": "Prefix to use for alert creation" + } + }, + "ActionGroupEmails": { + "type": "array", + "defaultValue": [], + "metadata": { + "description": "Email addresses to be added to the action group. Use the format [\"name1@domain.com\",\"name2@domain.com\"]." + } + }, + "PrivateCloudResourceId": { + "type": "string", + "metadata": { + "description": "The existing Private Cloud full resource id" + } + } + }, + "variables": { + "varCuaid": "6f7b68e9-1179-4853-9dfe-1a4f793b9893", + "Alerts": [ + { + "Name": "CPU", + "Description": "CPU Usage per Cluster", + "Metric": "EffectiveCpuAverage", + "SplitDimension": "clustername", + "Threshold": 80, + "Severity": 2 + }, + { + "Name": "Memory", + "Description": "Memory Usage per Cluster", + "Metric": "UsageAverage", + "SplitDimension": "clustername", + "Threshold": 80, + "Severity": 2 + }, + { + "Name": "Storage", + "Description": "Storage Usage per Datastore", + "Metric": "DiskUsedPercentage", + "SplitDimension": "dsname", + "Threshold": 70, + "Severity": 2 + }, + { + "Name": "StorageCritical", + "Description": "Storage Usage per Datastore", + "Metric": "DiskUsedPercentage", + "SplitDimension": "dsname", + "Threshold": 75, + "Severity": 0 + } + ] + }, + "resources": [ + { + "type": "microsoft.insights/actionGroups", + "apiVersion": "2019-06-01", + "name": "[parameters('ActionGroupName')]", + "location": "Global", + "properties": { + "copy": [ + { + "name": "emailReceivers", + "count": "[length(parameters('ActionGroupEmails'))]", + "input": { + "emailAddress": "[parameters('ActionGroupEmails')[copyIndex('emailReceivers')]]", + "name": "[split(parameters('ActionGroupEmails')[copyIndex('emailReceivers')], '@')[0]]", + "useCommonAlertSchema": false + } + } + ], + "enabled": true, + "groupShortName": "[substring(format('avs{0}', uniqueString(parameters('ActionGroupName'))), 0, 12)]" + } + }, + { + "type": "Microsoft.Insights/activityLogAlerts", + "apiVersion": "2020-10-01", + "name": "[format('{0}-ServiceHealth', parameters('AlertPrefix'))]", + "location": "Global", + "properties": { + "description": "Service Health Alerts", + "condition": { + "allOf": [ + { + "field": "category", + "equals": "ServiceHealth" + }, + { + "field": "properties.impactedServices[*].ServiceName", + "containsAny": [ + "Azure VMware Solution" + ] + }, + { + "field": "properties.impactedServices[*].ImpactedRegions[*].RegionName", + "containsAny": [ + "[reference(parameters('PrivateCloudResourceId'), '2021-06-01', 'Full').location]", + "Global" + ] + } + ] + }, + "scopes": [ + "[subscription().id]" + ], + "enabled": true, + "actions": { + "actionGroups": [ + { + "actionGroupId": "[resourceId('microsoft.insights/actionGroups', parameters('ActionGroupName'))]" + } + ] + } + }, + "dependsOn": [ + "[resourceId('microsoft.insights/actionGroups', parameters('ActionGroupName'))]" + ] + }, + { + "copy": { + "name": "MetricAlert", + "count": "[length(variables('Alerts'))]" + }, + "type": "Microsoft.Insights/metricAlerts", + "apiVersion": "2018-03-01", + "name": "[format('{0}-{1}', parameters('AlertPrefix'), variables('Alerts')[copyIndex()].Name)]", + "location": "Global", + "properties": { + "description": "[variables('Alerts')[copyIndex()].Description]", + "criteria": { + "odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria", + "allOf": [ + { + "name": "Metric1", + "operator": "GreaterThan", + "threshold": "[variables('Alerts')[copyIndex()].Threshold]", + "timeAggregation": "Average", + "criterionType": "StaticThresholdCriterion", + "metricName": "[variables('Alerts')[copyIndex()].Metric]", + "dimensions": [ + { + "name": "[variables('Alerts')[copyIndex()].SplitDimension]", + "operator": "Include", + "values": [ + "*" + ] + } + ] + } + ] + }, + "scopes": [ + "[parameters('PrivateCloudResourceId')]" + ], + "severity": "[variables('Alerts')[copyIndex()].Severity]", + "evaluationFrequency": "PT5M", + "windowSize": "PT30M", + "autoMitigate": true, + "enabled": true, + "actions": [ + { + "actionGroupId": "[resourceId('microsoft.insights/actionGroups', parameters('ActionGroupName'))]" + } + ] + }, + "dependsOn": [ + "[resourceId('microsoft.insights/actionGroups', parameters('ActionGroupName'))]" + ] + }, + { + "type": "Microsoft.Resources/deployments", + "apiVersion": "2020-10-01", + "name": "[format('pid-{0}-{1}', variables('varCuaid'), uniqueString(resourceGroup().location))]", + "properties": { + "expressionEvaluationOptions": { + "scope": "inner" + }, + "mode": "Incremental", + "parameters": {}, + "template": { + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": { + "name": "bicep", + "version": "0.14.46.61228", + "templateHash": "8359988288953583068" + } + }, + "resources": [] + } + } + } + ] + } \ No newline at end of file diff --git a/patterns/avs/avsArm.param.json b/patterns/avs/avsArm.param.json new file mode 100644 index 000000000..005502644 --- /dev/null +++ b/patterns/avs/avsArm.param.json @@ -0,0 +1,14 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "ActionGroupEmails": { + "value": [ + "example@microsoft.com" + ] + }, + "PrivateCloudResourceId": { + "value": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/ExampleRG/providers/Microsoft.AVS/privateClouds/ExamplePrivateCloud" + } + } +} \ No newline at end of file