diff --git a/docs/content/contributing/_index.md b/docs/content/contributing/_index.md index cb3f2ba78..b12cd8edc 100644 --- a/docs/content/contributing/_index.md +++ b/docs/content/contributing/_index.md @@ -34,17 +34,17 @@ The example folder structure below highlights all of the key assets that define └── Deploy-VM-DataDiskReadLatency-Alert.json ``` -**patterns:** *This folder contains assets for pattern/scenario specific guidance that leverages the baseline alerts in this repo. This contribute does not cover contributions to the patterns/services section. There will be specific guides within each pattern/service section.* +**patterns:** *This folder contains assets for pattern/scenario specific guidance that leverages the baseline alerts in this repo. This guide does not cover contributions to the patterns/scenarios section. There will be specific guides within each pattern/scenarios section.* -**services:** *This folder contains the baseline alert definitions, guidance, and example deployment scripts. It is grouped by resource category (e.g. Compute), and then by resource type (e.g. virtualMachines).* +**services:** *This folder contains the baseline alert definitions, guidance, and example deployment scripts. It is grouped by resource provider (e.g. Compute), and then by resource type (e.g. virtualMachines).* {{< hint type=note >}} -You may need to add new resource category and/or resource type folders as you define new baseline alerts. These folders are case-sensitive and follow the naming conventions defined by the [Azure Resource Reference](https://learn.microsoft.com/azure/templates/) documentation. For example: Alert guidance for Microsoft.Compute/virtualMachines would go under 'services/Compute/virtualMachines' +You may need to add new resource provider and/or resource type folders as you define new baseline alerts. These folders are case-sensitive and follow the naming conventions defined by the [Azure Resource Reference](https://learn.microsoft.com/azure/templates/) documentation. For example: Alert guidance for Microsoft.Compute/virtualMachines would go under 'services/Compute/virtualMachines' {{< /hint >}} -**_index.md:** *These files control the menu structure and the content layout for GitHub Pages site. There are only two versions of these files, one for the resource categories, which just controls the friendly name in the menu and title. The other version is at the resource type level and it controls the layout of the GitHub Pages site. As you create new folders, just copy the respective versions and change the title in the metadata section at the top of the file.* +**_index.md:** *These files control the menu structure and the content layout for GitHub Pages site. There are only two versions of these files, one for the resource providers, which just controls the friendly name in the menu and title. The other version is at the resource type level and it controls the layout of the GitHub Pages site. As you create new folders, just copy the respective versions and change the title in the metadata section at the top of the file.* -**alerts.yaml:** *This YAML-based file contains the detailed definition and guidance for the baseline alerts within each resource category/type folder. Below is the general structure of the file.* +**alerts.yaml:** *This YAML-based file contains the detailed definition and guidance for the baseline alerts within each resource provider/type folder. Below is the general structure of the file.* ```yaml - name: @@ -96,7 +96,7 @@ Please note the following settings in the alert definition: ## Auto-Generated Alert Rules -A script was run to automatically generate alert rules based on top usage and settings trends. These rules have been added to their respective *alerts.yaml* files and have two tags associated with them: *auto-generated* and *agc-xxxx*. The *agc-xxxx* tag indicates the number of results found for that alert rule in the query used to analyze the top trends. This number should be used to evaluate the importance of including that alert rule as guidance in the repo. Once an auto-generated alert rule has been verified and updated with reference documentation, the *visible* property should be set to *true*. This will make the alert rule visible on the site. Resource categories and types that do not have visible alerts are currently hidden from the table of contents. To make those resource categories and types visible, edit their respective *_index.md* files and remove the *geekdocHidden: true* metadata from the top of the file. +A script was run to automatically generate alert rules based on top usage and settings trends. These rules have been added to their respective *alerts.yaml* files and have two tags associated with them: *auto-generated* and *agc-xxxx*. The *agc-xxxx* tag indicates the number of results found for that alert rule in the query used to analyze the top trends. This number should be used to evaluate the importance of including that alert rule as guidance in the repo. Once an auto-generated alert rule has been verified and updated with reference documentation, the *visible* property should be set to *true*. This will make the alert rule visible on the site. Resource providers and types that do not have visible alerts are currently hidden from the table of contents. To make those resource providers and types visible, edit their respective *_index.md* files and remove the *geekdocHidden: true* metadata from the top of the file. ## Context/Background @@ -196,4 +196,4 @@ Once you have committed changes to your fork of the AMBA repo, you create a pull 1. Sometimes the local version of the website may show some inconsistencies that don't reflect the content you have created. - - If this happens, kill the Hugo local web server by pressing CTRL+C and then restart the Hugo web server by running `hugo server -D` from the root of the repo. +- If this happens, kill the Hugo local web server by pressing CTRL+C and then restart the Hugo web server by running `hugo server -D` from the root of the repo. diff --git a/docs/content/visualizations/Azure Workbooks/_index.md b/docs/content/visualizations/Azure Workbooks/_index.md index 7c9b9b0dc..0fc444e37 100644 --- a/docs/content/visualizations/Azure Workbooks/_index.md +++ b/docs/content/visualizations/Azure Workbooks/_index.md @@ -4,6 +4,50 @@ geekdocCollapseSection: true --- ## Overview + [Azure Workbooks](https://learn.microsoft.com/azure/azure-monitor/visualize/workbooks-overview) provide a flexible canvas for data analysis and the creation of rich visual reports. You can use workbooks to tap into multiple data sources from across Azure and combine them into unified interactive experiences. -## Under Construction +Listed below are some examples of workbooks that you can use to visualize alerts and key metrics from Azure resources. These workbooks templates can be saved to your workbook gallery in Azure. + +You can also find information below on [how to save workbook templates](#import-workbook-templates-quick-start-guide) + +## Azure Monitor Community + +The Azure Monitor Team utilizes [this](https://github.com/microsoft/AzureMonitorCommunity/tree/master/Azure%20Services) github repo to share workbooks for various azure services. Below are some workbooks to highlight alert management and ExpressRoute/network monitoring. + +## [Alert Management Workbook](https://github.com/microsoft/AzureMonitorCommunity/blob/master/Azure%20Services/Azure%20Monitor/Workbooks/Alerts%20Management.workbook) + +A summary of alerts by your filtered subscription. This workbook contains visualizations of alerts triggered by type, serverity and top 5 noisiest objects.![alert management](../../img/alert-management-wb.png) + +## [ExpressRoute Monitoring Workbook](https://github.com/microsoft/AzureMonitorCommunity/blob/master/Azure%20Services/Azure%20Monitor/Workbooks/Azure%20Network%20Monitoring.workbook) + +This workbook addresses a common challenge to effectively visualize the health and availability of ExpressRoute components. This is an interactive workbook that provides comprehensive monitoring and troubleshooting for ExpressRoute, including the monitoring of key metrics such as: ExpressRoute Circuit Status, BGP availablity, total throughput, and more. + +For full details see: + [Monitoring ExpressRoute: A Workbook Solution](https://techcommunity.microsoft.com/t5/azure-observability-blog/monitoring-expressroute-a-workbook-solution/ba-p/4038130). + + ![image3](https://techcommunity.microsoft.com/t5/image/serverpage/image-id/545394i89157D8B217AA777/image-dimensions/2000?v=v2&px=-1) + ![image4](https://techcommunity.microsoft.com/t5/image/serverpage/image-id/545405i13A8ECBF9B370BB4/image-dimensions/2000?v=v2&px=-1) + ![image5](https://techcommunity.microsoft.com/t5/image/serverpage/image-id/545407i490AE5C9D99AECEE/image-dimensions/2000?v=v2&px=-1) + +## Import Workbook Templates: quick start guide + +Want to see these workbooks live in your Azure environment? Follow these steps to add gallery templates to your saved workbooks. + +1. Copy the raw file: + - In the examples above, the titles of the workbooks are hyperlinks to the raw files. From there you can explore other workbooks in the github repo. + ![image6](../../img/copy-raw-file.png) + +2. Open Azure Monitor, and navigate to Workbooks: + - Once here, click "new". + + ![image7](../../img/new-workbook.png) + +3. Open the advanced editor (): + - Paste the raw code, which was copied in step one, in the gallery template. + - Once finished, click apply. + ![image10](../../img/gallery-template.png) + +4. View your workbook and save it to your gallery: + + ![image11](../../img/save-workbook.png) diff --git a/docs/content/welcome/_index.md b/docs/content/welcome/_index.md index 7c7217bab..970a5314c 100644 --- a/docs/content/welcome/_index.md +++ b/docs/content/welcome/_index.md @@ -7,7 +7,7 @@ weight: 0 Welcome to the Azure Monitor Baseline Alerts (AMBA) site! The purpose of this site is to provide best practice guidance around key alerts metrics and their thresholds. This sites is broken down into two main sections: -1. **Services:** This section provides guidance for individual Azure services. For each service, there is a list of key alert metrics and the recommended thresholds. +1. **Azure Resources:** This section provides guidance for individual Azure resources. For each service, there is a list of key alert metrics and the recommended thresholds. 2. **Patterns / Scenarios:** This section provides guidance for common patterns / scenarios (like Azure Landing Zones), as well as policy definition and initiatives for deploying the alerts in your environment. diff --git a/docs/static/img/alert-management-wb.png b/docs/static/img/alert-management-wb.png new file mode 100644 index 000000000..42a66a156 Binary files /dev/null and b/docs/static/img/alert-management-wb.png differ diff --git a/docs/static/img/copy-raw-file.png b/docs/static/img/copy-raw-file.png new file mode 100644 index 000000000..a77e1e8f4 Binary files /dev/null and b/docs/static/img/copy-raw-file.png differ diff --git a/docs/static/img/gallery-template.png b/docs/static/img/gallery-template.png new file mode 100644 index 000000000..6227fa98c Binary files /dev/null and b/docs/static/img/gallery-template.png differ diff --git a/docs/static/img/new-workbook.png b/docs/static/img/new-workbook.png new file mode 100644 index 000000000..cee581e3d Binary files /dev/null and b/docs/static/img/new-workbook.png differ diff --git a/docs/static/img/save-workbook.png b/docs/static/img/save-workbook.png new file mode 100644 index 000000000..41196c37a Binary files /dev/null and b/docs/static/img/save-workbook.png differ diff --git a/services/DesktopVirtualization/_index.md b/services/DesktopVirtualization/_index.md new file mode 100644 index 000000000..f1fb714ba --- /dev/null +++ b/services/DesktopVirtualization/_index.md @@ -0,0 +1,6 @@ +--- +title: DesktopVirtualization +geekdocCollapseSection: true +geekdocHidden: true +--- + diff --git a/services/DesktopVirtualization/hostPools/_index.md b/services/DesktopVirtualization/hostPools/_index.md new file mode 100644 index 000000000..33654b7a6 --- /dev/null +++ b/services/DesktopVirtualization/hostPools/_index.md @@ -0,0 +1,7 @@ +--- +title: HostPools +geekdocCollapseSection: true +geekdocHidden: true +--- + +{{< alertList name="alertList" >}} diff --git a/services/DesktopVirtualization/hostPools/alerts.yaml b/services/DesktopVirtualization/hostPools/alerts.yaml new file mode 100644 index 000000000..2496614a7 --- /dev/null +++ b/services/DesktopVirtualization/hostPools/alerts.yaml @@ -0,0 +1,1388 @@ +- name: Capacity 85 Percent (xHostPoolNamex) + description: This alert is based on the Action Account and Runbook that populates the Log Analytics specificed with the AVD Metrics Deployment Solution for xHostPoolNamex. + -->Last Number in the string is the Percentage Remaining for the Host Pool. + Output is - + HostPoolName|ResourceGroup|Type|MaxSessionLimit|NumberHosts|TotalUsers|DisconnectedUser|ActiveUsers|SessionsAvailable|HostPoolPercentageLoad' + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 2 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT30M + evaluationFrequency: PT5M + threshold: 1 + resouceIdColumn: ResourceId + dimensions: + - name: HostPoolName + operator: Include + values: + - '*' + - name: UserSessionsTotal + operator: Include + values: + - '*' + - name: UserSessionsDisconnected + operator: Include + values: + - '*' + - name: UserSessionsActive + operator: Include + values: + - '*' + - name: UserSessionsAvailable + operator: Include + values: + - '*' + - name: HostPoolPercentLoad + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'AzureDiagnostics + + | where Category has "JobStreams" and StreamType_s == "Output" and RunbookName_s == "AvdHostPoolLogData" + + | sort by TimeGenerated + + | where TimeGenerated > now() - 5m + + | extend HostPoolName=tostring(split(ResultDescription, ''|'')[0]) + + | extend ResourceGroup=tostring(split(ResultDescription, ''|'')[1]) + + | extend Type=tostring(split(ResultDescription, ''|'')[2]) + + | extend MaxSessionLimit=toint(split(ResultDescription, ''|'')[3]) + + | extend NumberSessionHosts=toint(split(ResultDescription, ''|'')[4]) + + | extend UserSessionsTotal=toint(split(ResultDescription, ''|'')[5]) + + | extend UserSessionsDisconnected=toint(split(ResultDescription, ''|'')[6]) + + | extend UserSessionsActive=toint(split(ResultDescription, ''|'')[7]) + + | extend UserSessionsAvailable=toint(split(ResultDescription, ''|'')[8]) + + | extend HostPoolPercentLoad=toint(split(ResultDescription, ''|'')[9]) + + | extend HPResourceId=tostring(split(ResultDescription, ''|'')[13]) + + | extend ResourceId=tostring(HPResourceId) + + | where HostPoolPercentLoad >= 85 and HostPoolPercentLoad < 95 + + | where HostPoolName =~ ''xHostPoolNamex''' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: Capacity 95 Percent (xHostPoolNamex) + description: This alert is based on the Action Account and Runbook that populates the Log Analytics specificed with the AVD Metrics Deployment Solution for xHostPoolNamex. + -->Last Number in the string is the Percentage Remaining for the Host Pool. + Output is - + HostPoolName|ResourceGroup|Type|MaxSessionLimit|NumberHosts|TotalUsers|DisconnectedUser|ActiveUsers|SessionsAvailable|HostPoolPercentageLoad' + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 1 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT30M + evaluationFrequency: PT5M + threshold: 1 + resouceIdColumn: ResourceId + dimensions: + - name: HostPoolName + operator: Include + values: + - '*' + - name: UserSessionsTotal + operator: Include + values: + - '*' + - name: UserSessionsDisconnected + operator: Include + values: + - '*' + - name: UserSessionsActive + operator: Include + values: + - '*' + - name: UserSessionsAvailable + operator: Include + values: + - '*' + - name: HostPoolPercentLoad + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'AzureDiagnostics + + | where Category has "JobStreams" and StreamType_s == "Output" and RunbookName_s == "AvdHostPoolLogData" + + | sort by TimeGenerated + + | where TimeGenerated > now() - 5m + + | extend HostPoolName=tostring(split(ResultDescription, ''|'')[0]) + + | extend ResourceGroup=tostring(split(ResultDescription, ''|'')[1]) + + | extend Type=tostring(split(ResultDescription, ''|'')[2]) + + | extend MaxSessionLimit=toint(split(ResultDescription, ''|'')[3]) + + | extend NumberSessionHosts=toint(split(ResultDescription, ''|'')[4]) + + | extend UserSessionsTotal=toint(split(ResultDescription, ''|'')[5]) + + | extend UserSessionsDisconnected=toint(split(ResultDescription, ''|'')[6]) + + | extend UserSessionsActive=toint(split(ResultDescription, ''|'')[7]) + + | extend UserSessionsAvailable=toint(split(ResultDescription, ''|'')[8]) + + | extend HostPoolPercentLoad=toint(split(ResultDescription, ''|'')[9]) + + | extend HPResourceId=tostring(split(ResultDescription, ''|'')[13]) + + | extend ResourceId=tostring(HPResourceId) + + | where HostPoolPercentLoad >= 95 + + | where HostPoolName =~ ''xHostPoolNamex''' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: No Resources Available (xHostPoolNamex) + description: Catastrophic Event! Indicates potential problems with dependencies, diagnose and resolve for xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 1 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT15M + evaluationFrequency: PT15M + threshold: 1 + resouceIdColumn: _ResourceId + dimensions: + - name: UserName + operator: Include + values: + - '*' + - name: SessionHostName + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'WVDConnections + + | where TimeGenerated > ago (15m) + + | where _ResourceId contains "xHostPoolNamex" + + | project-away TenantId,SourceSystem + + | summarize arg_max(TimeGenerated, *), StartTime = min(iff(State== \''Started\'', TimeGenerated , datetime(null) )), ConnectTime = min(iff(State== \''Connected\'', TimeGenerated , datetime(null) )) by CorrelationId + + | join kind=leftouter (WVDErrors + + |summarize Errors=makelist(pack(\''Code\'', Code, \''CodeSymbolic\'', CodeSymbolic, \''Time\'', TimeGenerated, \''Message\'', Message ,\''ServiceError\'', ServiceError, \''Source\'', Source)) by CorrelationId + + ) on CorrelationId + + | join kind=leftouter (WVDCheckpoints + + | summarize Checkpoints=makelist(pack(\''Time\'', TimeGenerated, \''Name\'', Name, \''Parameters\'', Parameters, \''Source\'', Source)) by CorrelationId + + | mv-apply Checkpoints on ( + + order by todatetime(Checkpoints[\''Time\'']) asc + + | summarize Checkpoints=makelist(Checkpoints)) + + ) on CorrelationId + + | project-away CorrelationId1, CorrelationId2 + + | order by TimeGenerated desc + + | where Errors[0].CodeSymbolic == "ConnectionFailedNoHealthyRdshAvailable"' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: User Disconnected over 24h (xHostPoolNamex) + description: Verify Remote Desktop Policies are applied relating to Session Limits for xHostPoolNamex. This could impact your scaling plan as well. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 2 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT1H + evaluationFrequency: PT1H + threshold: 1 + resouceIdColumn: _ResourceId + dimensions: + - name: UserName + operator: Include + values: + - '*' + - name: SessionHostName + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'WVDConnections + + | where TimeGenerated > ago(24h) + + | where State == "Connected" + + | where _ResourceId contains "xHostPoolNamex" + + | project CorrelationId , UserName, ConnectionType, StartTime=TimeGenerated, SessionHostName + + | join (WVDConnections + + | where State == "Completed" + + | project EndTime=TimeGenerated, CorrelationId) + + on CorrelationId + + | project Duration = EndTime - StartTime, ConnectionType, UserName, SessionHostName + + | where Duration >= timespan(24:00:00) + + | sort by Duration desc' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: User Disconnected over 72h (xHostPoolNamex) + description: Verify Remote Desktop Policies are applied relating to Session Limits for xHostPoolNamex. This could impact your scaling plan as well. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 2 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT1H + evaluationFrequency: PT1H + threshold: 1 + resouceIdColumn: _ResourceId + dimensions: + - name: UserName + operator: Include + values: + - '*' + - name: SessionHostName + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'WVDConnections + + | where TimeGenerated > ago(24h) + + | where State == "Connected" + + | where _ResourceId contains "xHostPoolNamex" + + | project CorrelationId , UserName, ConnectionType, StartTime=TimeGenerated, SessionHostName + + | join (WVDConnections + + | where State == "Completed" + + | project EndTime=TimeGenerated, CorrelationId) + + on CorrelationId + + | project Duration = EndTime - StartTime, ConnectionType, UserName, SessionHostName + + | where Duration >= timespan(72:00:00) + + | sort by Duration desc' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: Local Disk Space less than 10% (xHostPoolNamex) + description: Disk space Moderately Low. \nConsider review of the VM local C drive and determine what is consuming disk space for the VM in xHostPoolNamex. This could be local profiles or temp files that need to be cleaned up or removed. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 2 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT15M + evaluationFrequency: PT15M + threshold: 1 + resouceIdColumn: _ResourceId + dimensions: + - name: ComputerName + operator: Include + values: + - '*' + - name: VMresourceGroup + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'Perf + + | where TimeGenerated > ago(15m) + + | where ObjectName == "LogicalDisk" and CounterName == "% Free Space" + + | where InstanceName !contains "D:" + + | where InstanceName !contains "_Total" | where CounterValue <= 10.00 + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" ResourceGroup "/providers/microsoft.compute/virtualmachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | project ComputerName, CounterValue, subscription, ResourceGroup, TimeGenerated + + | join kind = leftouter + + (WVDAgentHealthStatus + + | where TimeGenerated > ago(15m) + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscriptionAgentHealth "/resourcegroups/" ResourceGroupAgentHealth "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" VMsubscription "/resourceGroups/" VMresourceGroup "/providers/Microsoft.Compute/virtualMachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | summarize arg_max(TimeGenerated,*) by ComputerName + + | project VMresourceGroup, ComputerName, HostPool, _ResourceId + + ) on ComputerName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: Local Disk Space less than 5% (xHostPoolNamex) + description: Disk space Moderately Low. \nConsider review of the VM local C drive and determine what is consuming disk space for the VM in xHostPoolNamex. This could be local profiles or temp files that need to be cleaned up or removed. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 1 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT15M + evaluationFrequency: PT15M + threshold: 1 + resouceIdColumn: _ResourceId + dimensions: + - name: ComputerName + operator: Include + values: + - '*' + - name: VMresourceGroup + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'Perf + + | where TimeGenerated > ago(15m) + + | where ObjectName == "LogicalDisk" and CounterName == "% Free Space" + + | where InstanceName !contains "D:" + + | where InstanceName !contains "_Total" + + | where CounterValue <= 5.00 + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" ResourceGroup "/providers/microsoft.compute/virtualmachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | project ComputerName, CounterValue, subscription, ResourceGroup, TimeGenerated + + | join kind = leftouter + + ( + + WVDAgentHealthStatus + + | where TimeGenerated > ago(15m) + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscriptionAgentHealth "/resourcegroups/" ResourceGroupAgentHealth "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" VMsubscription "/resourceGroups/" VMresourceGroup "/providers/Microsoft.Compute/virtualMachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | summarize arg_max(TimeGenerated,*) by ComputerName + + | project VMresourceGroup, ComputerName, HostPool, _ResourceId + + ) on ComputerName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: FSLogix Profile less than 5% (xHostPoolNamex) + description: User Profiles Service logged Event ID 33. Expand User's Virtual Profile Disk and/or clean up user profile data on the VM in xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 2 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT5M + evaluationFrequency: PT5M + threshold: 1 + dimensions: + - name: ComputerName + operator: Include + values: + - '*' + - name: RenderedDescription + operator: Include + values: + - '*' + - name: VMresourceGroup + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'Event + + | where EventLog == "Microsoft-FSLogix-Apps/Admin" + + | where EventLevelName == "Warning" + + | where EventID == 34 + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" ResourceGroup "/providers/microsoft.compute/virtualmachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | project ComputerName, RenderedDescription, subscription, ResourceGroup, TimeGenerated + + | join kind = leftouter + + (WVDAgentHealthStatus + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscriptionAgentHealth "/resourcegroups/" ResourceGroupAgentHealth "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" VMsubscription "/resourceGroups/" VMresourceGroup "/providers/Microsoft.Compute/virtualMachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | summarize arg_max(TimeGenerated,*) by ComputerName + + | project VMresourceGroup, ComputerName, HostPool + + ) on ComputerName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: FSLogix Profile less than 2% (xHostPoolNamex) + description: User Profiles Service logged Event ID 34. Expand User's Virtual Profile Disk and/or clean up user profile data on the VM in xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 1 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT5M + evaluationFrequency: PT5M + threshold: 1 + dimensions: + - name: ComputerName + operator: Include + values: + - '*' + - name: RenderedDescription + operator: Include + values: + - '*' + - name: VMresourceGroup + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'Event + + | where EventLog == "Microsoft-FSLogix-Apps/Admin" + + | where EventLevelName == "Error" + + | where EventID == 33 + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" ResourceGroup "/providers/microsoft.compute/virtualmachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | project ComputerName, RenderedDescription, subscription, ResourceGroup, TimeGenerated + + | join kind = leftouter + + (WVDAgentHealthStatus + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscriptionAgentHealth "/resourcegroups/" ResourceGroupAgentHealth "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" VMsubscription "/resourceGroups/" VMresourceGroup "/providers/Microsoft.Compute/virtualMachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | summarize arg_max(TimeGenerated,*) by ComputerName + + | project VMresourceGroup, ComputerName, HostPool + + ) on ComputerName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: FSLogix Network Issue (xHostPoolNamex) + description: User Profiles Service logged Event ID 43. Verify network communications between the storage and AVD VM related to xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 1 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT1D + evaluationFrequency: PT5M + threshold: 1 + dimensions: + - name: ComputerName + operator: Include + values: + - '*' + - name: RenderedDescription + operator: Include + values: + - '*' + - name: VMresourceGroup + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'Event + + | where EventLog == "Microsoft-FSLogix-Apps/Admin" + + | where EventLevelName == "Error" + + | where EventID == 43 + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" ResourceGroup "/providers/microsoft.compute/virtualmachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | project ComputerName, RenderedDescription, subscription, ResourceGroup, TimeGenerated + + | join kind = leftouter + + (WVDAgentHealthStatus + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscriptionAgentHealth "/resourcegroups/" ResourceGroupAgentHealth "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" VMsubscription "/resourceGroups/" VMresourceGroup "/providers/Microsoft.Compute/virtualMachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | summarize arg_max(TimeGenerated,*) by ComputerName + + | project VMresourceGroup, ComputerName, HostPool + + ) on ComputerName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: FSLogix Profile Disk Failed to Attach (xHostPoolNamex) + description: User Profiles Service logged an Event ID 52 or 40. Investigate error details for reason regarding xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 1 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT1D + evaluationFrequency: PT5M + resourceIdColumn: _ResourceId + threshold: 1 + dimensions: + - name: ComputerName + operator: Include + values: + - '*' + - name: RenderedDescription + operator: Include + values: + - '*' + - name: VMresourceGroup + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'Event + + | where EventLog == "Microsoft-FSLogix-Apps/Admin" + + | where EventLevelName == "Error" + + | where EventID == 42 or EventID == 40 + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" ResourceGroup "/providers/microsoft.compute/virtualmachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | project ComputerName, RenderedDescription, subscription, ResourceGroup, TimeGenerated + + | join kind = leftouter + + (WVDAgentHealthStatus + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscriptionAgentHealth "/resourcegroups/" ResourceGroupAgentHealth "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" VMsubscription "/resourceGroups/" VMresourceGroup "/providers/Microsoft.Compute/virtualMachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | summarize arg_max(TimeGenerated,*) by ComputerName + + | project VMresourceGroup, ComputerName, HostPool + + ) on ComputerName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: FSLogix Service Disabled (xHostPoolNamex) + description: User Profile Service Disabled. Determine why service was disabled and re-enable / start the FSLogix service. Regarding xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 1 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT1D + evaluationFrequency: PT5M + resourceIdColumn: _ResourceId + threshold: 1 + dimensions: + - name: ComputerName + operator: Include + values: + - '*' + - name: RenderedDescription + operator: Include + values: + - '*' + - name: VMresourceGroup + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'Event + + | where EventLog == "Microsoft-FSLogix-Apps/Admin" + + | where EventLevelName == "Warning" + + | where EventID == 60 + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" ResourceGroup "/providers/microsoft.compute/virtualmachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | project ComputerName, RenderedDescription, subscription, ResourceGroup, TimeGenerated + + | join kind = leftouter + + (WVDAgentHealthStatus + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscriptionAgentHealth "/resourcegroups/" ResourceGroupAgentHealth "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" VMsubscription "/resourceGroups/" VMresourceGroup "/providers/Microsoft.Compute/virtualMachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | summarize arg_max(TimeGenerated,*) by ComputerName + + | project VMresourceGroup, ComputerName, HostPool + + ) on ComputerName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: FSLogix Disk Compact Failure (xHostPoolNamex) + description: User Profile Service logged Event ID 62 or 63. The profile Disk was marked for compaction due to additional white space but failed. See error details for additional information regarding xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 2 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT1D + evaluationFrequency: PT5M + resourceIdColumn: _ResourceId + threshold: 1 + dimensions: + - name: ComputerName + operator: Include + values: + - '*' + - name: RenderedDescription + operator: Include + values: + - '*' + - name: VMresourceGroup + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'Event + + | where EventLog == "Microsoft-FSLogix-Apps/Admin" + + | where EventLevelName == "Error" + + | where EventID == 62 or EventID == 63 + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" ResourceGroup "/providers/microsoft.compute/virtualmachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | project ComputerName, RenderedDescription, subscription, ResourceGroup, TimeGenerated + + | join kind = leftouter + + (WVDAgentHealthStatus + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscriptionAgentHealth "/resourcegroups/" ResourceGroupAgentHealth "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" VMsubscription "/resourceGroups/" VMresourceGroup "/providers/Microsoft.Compute/virtualMachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | summarize arg_max(TimeGenerated,*) by ComputerName + + | project VMresourceGroup, ComputerName, HostPool + + ) on ComputerName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: FSLogix Disk Already In Use (xHostPoolNamex) + description: User Profile Service logged an Event ID 51. This indicates that a user attempted to load their profile disk but it was in use or possibly mapped to another VM. Ensure the user is not connected to another host pool or remote app with the same profile. Regarding xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 2 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT1D + evaluationFrequency: PT5M + resourceIdColumn: _ResourceId + threshold: 1 + dimensions: + - name: ComputerName + operator: Include + values: + - '*' + - name: RenderedDescription + operator: Include + values: + - '*' + - name: VMresourceGroup + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'Event + + | where EventLog == "Microsoft-FSLogix-Apps/Operational" + + | where EventLevelName == "Warning" + + | where EventID == 51 + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" ResourceGroup "/providers/microsoft.compute/virtualmachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | project ComputerName, RenderedDescription, subscription, ResourceGroup, TimeGenerated + + | join kind = leftouter + + (WVDAgentHealthStatus + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscriptionAgentHealth "/resourcegroups/" ResourceGroupAgentHealth "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" VMsubscription "/resourceGroups/" VMresourceGroup "/providers/Microsoft.Compute/virtualMachines/" ComputerName + + | extend ComputerName=tolower(ComputerName) + + | summarize arg_max(TimeGenerated,*) by ComputerName + + | project VMresourceGroup, ComputerName, HostPool + + ) on ComputerName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: Session Host Healthcheck Failure (xHostPoolNamex) + description: VM is available for use but one of the dependent resources is in a failed state for hostpool xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 2 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT15M + evaluationFrequency: PT15M + resourceIdColumn: _ResourceId + threshold: 1 + dimensions: + - name: SessionHostName + operator: Include + values: + - '*' + - name: HealthCheckDesc + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + - name: SessionHostRG + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'let MapToDesc = (idx: long) { + + case(idx == 0, "DomainJoin", + + idx == 1, "DomainTrust", + + idx == 2, "FSLogix", + + idx == 3, "SxSStack", + + idx == 4, "URLCheck", + + idx == 5, "GenevaAgent", + + idx == 6, "DomainReachable", + + idx == 7, "WebRTCRedirector", + + idx == 8, "SxSStackEncryption", + + idx == 9, "IMDSReachable", + + idx == 10, "MSIXPackageStaging", + + "InvalidIndex")}; + + WVDAgentHealthStatus + + | where TimeGenerated > ago(10m) + + | where Status != \''Available\'' + + | where AllowNewSessions = True + + | extend CheckFailed = parse_json(SessionHostHealthCheckResult) + + | mv-expand CheckFailed + + | where CheckFailed.AdditionalFailureDetails.ErrorCode != 0 + + | extend HealthCheckName = tolong(CheckFailed.HealthCheckName) + + | extend HealthCheckResult = tolong(CheckFailed.HealthCheckResult) + + | extend HealthCheckDesc = MapToDesc(HealthCheckName) + + | where HealthCheckDesc != \''InvalidIndex\'' + + | where _ResourceId contains "xHostPoolNamex" + + | parse _ResourceId with "/subscriptions/" subscription "/resourcegroups/" HostPoolResourceGroup "/providers/microsoft.desktopvirtualization/hostpools/" HostPool + + | parse SessionHostResourceId with "/subscriptions/" HostSubscription "/resourceGroups/" SessionHostRG " /providers/Microsoft.Compute/virtualMachines/" SessionHostName' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: Personal Desktop Assigned Healthcheck Failure (xHostPoolNamex) + description: VM is assigned to a user but one of the dependent resources is in a failed state for hostpool xHostPoolNamex. This alert relies on the runbook AvdHostPoolLogData. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 1 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT5M + evaluationFrequency: PT5M + resourceIdColumn: _ResourceId + threshold: 1 + dimensions: + - name: SessionHostName + operator: Include + values: + - '*' + - name: HealthCheckDesc + operator: Include + values: + - '*' + - name: HostPool + operator: Include + values: + - '*' + - name: SessionHostRG + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'AzureDiagnostics + + | where Category has "JobStreams" and StreamType_s == "Output" and RunbookName_s == "AvdHostPoolLogData" + + | sort by TimeGenerated + + | where TimeGenerated > ago(15m) + + | extend HostPoolName=tostring(split(ResultDescription, ''|'')[0]) + + | extend ResourceGroup=tostring(split(ResultDescription, ''|'')[1]) + + | extend Type=tostring(split(ResultDescription, ''|'')[2]) + + | extend NumberSessionHosts=toint(split(ResultDescription, ''|'')[4]) + + | extend UserSessionsActive=toint(split(ResultDescription, ''|'')[7]) + + | extend NumPersonalUnhealthy=toint(split(ResultDescription, ''|'')[10]) + + | extend PersonalSessionHost=extract_json("$.SessionHost", tostring(split(ResultDescription, ''|'')[11]), typeof(string)) + + | extend PersonalAssignedUser=extract_json("$.AssignedUser", tostring(split(ResultDescription, ''|'')[11]), typeof(string)) + + | where HostPoolName =~ ''xHostPoolNamex'' + + | where Type == ''Personal'' + + | where NumPersonalUnhealthy > 0 ' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false +- name: User Connection to Session Host Failure (xHostPoolNamex) + description: While trying to connect to xHostPoolNamex a user had an error and failed to connect to a VM. There are lots of variables between the end uers and AVD VMs. If this is frequent for the user, determine if their Internet connection is slow or latency is over 150 ms. Regarding xHostPoolNamex. + type: Log + verified: false + visible: true + tags: + - avd + properties: + severity: 3 + operator: GreaterThanOrEqual + timeAggregation: Count + windowSize: PT5M + evaluationFrequency: PT5M + resourceIdColumn: _ResourceId + threshold: 1 + dimensions: + - name: HostPool + operator: Include + values: + - '*' + - name: ResourceGroup + operator: Include + values: + - '*' + - name: UserName + operator: Include + values: + - '*' + - name: ClientOS + operator: Include + values: + - '*' + - name: ClientVersion + operator: Include + values: + - '*' + - name: ClientSideIPAddress + operator: Include + values: + - '*' + - name: ConnectionType + operator: Include + values: + - '*' + - name: ErrorShort + operator: Include + values: + - '*' + - name: ErrorMessage + operator: Include + values: + - '*' + failingPeriods: + numberOfEvaluationPeriods: 1 + minFailingPeriodsToAlert: 1 + query: 'WVDConnections + + // | where UserName == "upn.here@contoso.com" + + | project-away TenantId,SourceSystem + + | summarize arg_max(TimeGenerated, *), StartTime = min(iff(State==''Started'', TimeGenerated , datetime(null) )), ConnectTime = min(iff(State==''Connected'', TimeGenerated , datetime(null) )) by CorrelationId + + | join kind=leftouter (WVDErrors + + |summarize Errors=make_list(pack(''Code'', Code, ''CodeSymbolic'', CodeSymbolic, ''Time'', TimeGenerated, ''Message'', Message ,''ServiceError'', ServiceError, ''Source'', Source)) by CorrelationId + + ) on CorrelationId + + | join kind=leftouter (WVDCheckpoints + + | summarize Checkpoints=make_list(pack(''Time'', TimeGenerated, ''Name'', Name, ''Parameters'', Parameters, ''Source'', Source)) by CorrelationId + + | mv-apply Checkpoints on ( + + order by todatetime(Checkpoints[''Time'']) asc + + | summarize Checkpoints=make_list(Checkpoints)) + + ) on CorrelationId + + | project-away CorrelationId1, CorrelationId2 + + | order by TimeGenerated desc + + | where TimeGenerated > ago(15m) + + | extend ResourceGroup=tostring(split(_ResourceId, ''/'')[4]) + + | extend HostPool=tostring(split(_ResourceId, ''/'')[8]) + + | where HostPool =~ ''xHostPoolNamex'' + + | extend ErrorShort=tostring(Errors[0].CodeSymbolic) + + | extend ErrorMessage=tostring(Errors[0].Message) + + | project TimeGenerated, HostPool, ResourceGroup, UserName, ClientOS, ClientVersion, ClientSideIPAddress, ConnectionType, ErrorShort, ErrorMessage' + autoMitigate: true + autoResolve: true + autoResolveTime: '0:30:00' + references: + deployments: + - name: AVD-HostPool + template: Deploy-AVD-HostPool-Alert.json + type: Policy + tags: + - alz + properties: + scope: Subscription + multiResource: false diff --git a/services/Storage/storageAccounts/alerts.yaml b/services/Storage/storageAccounts/alerts.yaml index 905b0bdb2..b5751bc74 100644 --- a/services/Storage/storageAccounts/alerts.yaml +++ b/services/Storage/storageAccounts/alerts.yaml @@ -19,7 +19,7 @@ evaluationFrequency: PT5M timeAggregation: Average operator: LessThan - threshold: 90 + threshold: 100 # JCore - Changed from 90 to 100 per customer feedback criterionType: StaticThresholdCriterion autoMitigate: false references: @@ -36,6 +36,54 @@ properties: scope: Resource multiResource: false +# JCore - Added based on AVD Alerts included this storage alert +- name: Throttling + description: + The storage account will be throttled if throughput exceeds the account's tier limit. Increasing the file share or storage tier may be necessary. + type: Metric + verified: false + visible: true + tags: + - alz + properties: + metricName: Transactions + dimensions: + - name: ResponseType + operator: Include + values: + - SuccessWithThrottling + SuccessWithShareIopsThrottling + ClientShareIopsThrottlingError + - name: FileShare + operator: Include + values: + - SuccessWithShareEgressThrottling + SuccessWithShareIngressThrottling + SuccessWithShareIopsThrottling + ClientShareEgressThrottlingError + ClientShareIngressThrottlingError + ClientShareIopsThrottlingError + metricNamespace: Microsoft.Storage/storageAccounts/fileServices + severity: 2 + windowSize: PT15M + evaluationFrequency: PT5M + timeAggregation: Total + operator: GreaterThanOrEqual + threshold: 1 + criterionType: StaticThresholdCriterion + autoMitigate: false + references: + - name: High latency, low throughput, or low IOPS + url: https://learn.microsoft.com/en-us/troubleshoot/azure/azure-storage/files-troubleshoot-performance?tabs=windows#high-latency-low-throughput-or-low-iops + deployments: + - name: Deploy SA Throttling Alert + template: Deploy-SA-Throttling-Alert.json + type: Policy + tags: + - alz + properties: + scope: Resource + multiResource: false #consider activity log alert for deletion of storage accounts to add to ALZ pattern #AUTO GENERATED ALERTS/THRESHOLDS - name: UsedCapacity diff --git a/services/_index.md b/services/_index.md index c58e2c9f0..c41a12867 100644 --- a/services/_index.md +++ b/services/_index.md @@ -1,5 +1,5 @@ --- -title: Services +title: Azure Resources weight: 5 geekdocCollapseSection: true --- diff --git a/tooling/export-alerts/export-alerts.py b/tooling/export-alerts/export-alerts.py index 1e72c6c0e..af6fa8f03 100644 --- a/tooling/export-alerts/export-alerts.py +++ b/tooling/export-alerts/export-alerts.py @@ -88,11 +88,15 @@ def addAlertToSheet(alert, ws, headerRow=1): elif key == 'references': references = alert['references'] urls = [] - for ref in references: - if 'url' in ref: - urls.append(ref['url']) - else: - print ('No URL in reference: ' + ref['name']) + + if references: + for ref in references: + if 'url' in ref: + urls.append(ref['url']) + else: + print ('No URL in reference: ' + ref['name']) + else: + print ('No references in alert: ' + alert['name']) value = '\n'.join(urls) elif type(alert[key]) is str or type(alert[key]) is int or type(alert[key]) is bool: