Skip to content

Commit

Permalink
Merge pull request #143 from Brunoga-MS/main
Browse files Browse the repository at this point in the history
AMBA - Allow for alert notification suppression during resources maintenance
  • Loading branch information
arjenhuitema authored Mar 15, 2024
2 parents 198043d + 13a728d commit e910399
Show file tree
Hide file tree
Showing 18 changed files with 440 additions and 132 deletions.
26 changes: 14 additions & 12 deletions docs/content/patterns/alz/Cleaning-up-a-Deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,20 +46,22 @@ Follow the instructions below to download the cleanup script file. Alternatively
5. Sign in to the Azure with the `Connect-AzAccount` command. The account you sign in as needs to have permissions to remove Policy Assignments, Policy Definitions, and resources at the desired Management Group scope.
6. Execute the script using one of the options below:

**Generate a list of the resource IDs which would be deleted by this script:**
{{% include "PowerShell-ExecutionPolicy.md" %}}

```powershell
./Start-AMBACleanup.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -ReportOnly
```
**Generate a list of the resource IDs which would be deleted by this script:**

**Show output of what would happen if deletes executed:**
```powershell
./Start-AMBACleanup.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -ReportOnly
```
```powershell
./Start-AMBACleanup.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -WhatIf
```
**Show output of what would happen if deletes executed:**
**Delete all resources deployed by the ALZ-Monitor IaC without prompting for confirmation:**
```powershell
./Start-AMBACleanup.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -WhatIf
```
```powershell
./Start-AMBACleanup.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -Force
```
**Delete all resources deployed by the ALZ-Monitor IaC without prompting for confirmation:**
```powershell
./Start-AMBACleanup.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -Force
```
19 changes: 14 additions & 5 deletions docs/content/patterns/alz/Disabling-Policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,23 @@ geekdocCollapseSection: true
weight: 60
---

The policies in AMBA provide multiple methods to enable or disable the effects of the policy.
The policies in AMBA provide multiple methods to enable or disable the effects of the policy.

1. **Parameter: AlertState** - Determines the state of the alert rule. This either deploys an alert rule in a disabled state, or disables an already deployed alert rule at scale trough policy.
1. **Parameter: PolicyEffect** - Determines the effect of a Policy Definition, allowing a Policy to be deployed in a disabled state.
1. **Tag: MonitorDisable** - A tag that determines whether the resource should be evaluated. Allows you to exclude selected resources from monitoring.
2. **Parameter: PolicyEffect** - Determines the effect of a Policy Definition, allowing a Policy to be deployed in a disabled state.
3. **Tag: MonitorDisable** - A tag that determines whether the resource should be evaluated. Allows you to exclude selected resources from monitoring.

## AlertState parameter
Recognizing that it is not always possible to test alerts in a dev/test environment, we have introduced the AlertState parameter for all metric alerts (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and AlertState, for example VnetGwTunnelIngressAlertState). This is to address a scenario where an alert storm occurs and it is necessary to disable one or more alerts deployed via policies through a controlled process. This could be considered for a roll-back process as part of a change request.

Recognizing that it is not always possible to test alerts in a dev/test environment, we have introduced the AlertState parameter for all metric alerts (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and AlertState, for example VnetGwTunnelIngressAlertState). This is to address a scenario where an alert storm occurs and it is necessary to disable one or more alerts deployed via policies through a controlled process. This could be considered for a roll-back process as part of a change request.

### Allowed values

- "true" - Alert rule will be enabled. (Default)
- "false" - Alert rule will be disabled.

### How it works

The AlertState parameter is used for both compliance evaluation and configuration of the state of the alert rule. The value of the **AlertState** parameter is passed on to the **enabled** parameter which is part of the existenceCondition of the Policy.

```json
Expand Down Expand Up @@ -55,14 +59,17 @@ These are the high-level steps that would need to take place:
Note that the above approach will not delete the alerts objects in Azure, merely disable them. To delete the alerts you will have to do so manually. Also note that while you can engage the PolicyEffect to avoid deploying new alerts, you should not do so until you have successfully remediated the above. Otherwise the policy will be disabled, and you will not be able to turn alerts off via policy until that is changed back.

## PolicyEffect parameter

In general, we evaluate the alert rules on best practices, field experience, customer feedback, type of alert and possible impact. There are situations where disabling the policy makes sense to prevent receiving unnecessary and/ or duplicate alerts/ notifications. For example we deploy an alert rule for VPN Gateway Bandwidth Utilization, in turn we have disabled the alert rules for VPN Gateway Egress and Ingress.
The default is intended to provide a well balanced baseline. However you may want to Enable or Disable the creation of certain Alert rules to meet your needs.

### Allowed values

- "deployIfNotExists" - Policy will deploy the alert rule if the conditions are met. (Default for most Policies)
- "disabled" - The policy itself will be created but will not create the corresponding Alert rule.

### How it works

The PolicyEffect parameter is used for the configuration of the effect of the PolicyDefinition (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and PolicyEffect, for example ERCIRQoSDropBitsinPerSecPolicyEffect) . The value of the **PolicyEffect** parameter is passed on to the **effect** parameter which configures the effect of the Policy.

```json
Expand All @@ -84,9 +91,11 @@ The PolicyEffect parameter is used for the configuration of the effect of the Po
```

## MonitorDisable parameter

It´s also possible to exclude certain resources from being monitored. You may not want to monitor pre-production or dev environments. The MonitorDisable parameter contains the Tag name to determine whether a resource should be included. By default, creating the tag MonitorDisable with value "true" will prevent deployment of alert rules on those resources. This is easily adjusted to use existing tags, for example you could configure the parameter with the tag name "Environment" and tell it to deploy only if the tag value equals "prod", or when the tag isnt equal to "dev". Currently only the tag name is a parameter, other changes require minor changes in the code.

### How it works

The policyRule only continues if "allOff" is true. Meaning, the deployment will continue as long as the MonitorDisable tag doesn't exist or doesn't hold the value "true". When the tag holds "true", the "allOff" will return "false" as "notEquals": "true" is no longer satisfied, causing the deployment to stop

```json
Expand All @@ -103,4 +112,4 @@ The policyRule only continues if "allOff" is true. Meaning, the deployment will
}
]
}
```
```
20 changes: 10 additions & 10 deletions docs/content/patterns/alz/Known-Issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ The underlying data is not present in the Log Analytics table.

### Resolution

For VM Alerts please enable [VM Insights](Monitoring-and-Alerting#log-alerts).
For VM Alerts, enable [VM Insights](../Monitoring-and-Alerting#log-alerts).

## Failed to deploy because of role assignemnt issue
## Failed to deploy because of role assignment issue

Deployment of AMBA fails when there are orphaned role assignements.
Deployment of AMBA fails when there are orphaned role assignments.

### Error includes

Expand All @@ -31,7 +31,7 @@ Deployment of AMBA fails when there are orphaned role assignements.

### Cause

When a role or a role assignement is removed, some orphaned object can still appear, preventing a successful deployment.
When a role or a role assignment is removed, some orphaned object can still appear, preventing a successful deployment.

### Resolution

Expand All @@ -48,10 +48,10 @@ When a role or a role assignement is removed, some orphaned object can still app

### Cause

A deployment has been performed using one region, for example "uksouth", and when you try to deploy again to the same scope but to a different region you will receive an error. This happens even when a cleanup has been performed (see [Cleaning up a Deployment](../Cleaning-up-a-Deployment) for more details). This is because deployment entries still exists from the previous operation, so a region conflict is detected blocking you to run another deployment using a different region.
A deployment has been performed using one region, for example "uksouth", and when you try to deploy again to the same scope but to a different region you will receive an error. This happens even when a cleanup has been performed (see [Cleaning up a Deployment](../Cleaning-up-a-Deployment) for more details). This is because deployment entries still exist from the previous operation, so a region conflict is detected blocking you to run another deployment using a different region.

### Resolution
Situation 1: You are trying to deploy to a different region in addition to a previous deployment. Deploying to the same scope in a different region is not necessary. The definitions and assignments are scoped to a management group and are not region specific. No action is required.
Situation 1: You are trying to deploy to a region different from the one used in previous deployment. Deploying to the same scope in a different region is not necessary. The definitions and assignments are scoped to a management group and are not region-specific. No action is required.

Situation 2: You cleaned up a previous implementation and want to deploy again to a different region. To resolve this issue, follow the steps below:

Expand All @@ -61,7 +61,7 @@ Situation 2: You cleaned up a previous implementation and want to deploy again t
4. Select all the deployment instances related to AMBA and click ***Delete***.

{{< hint type=Note >}}
To recognize the deployment names belonging to AMBA, select those whose names start with:
To recognize the deployment names belonging to AMBA, select those deployments whose names start with:

1. amba-
2. pid-
Expand All @@ -76,7 +76,7 @@ If you deployed AMBA just one time, you have 14 deployment instances

### Error includes

*Error: Code=MultipleErrorsOccurred; Message=Multiple error occurred: Conflict,Conflict,Conflict,Conflict,Conflict,Conflict.*
*Error: Code=MultipleErrorsOccurred; Message=Multiple errors occurred: Conflict,Conflict,Conflict,Conflict,Conflict,Conflict.*

### Cause

Expand All @@ -88,10 +88,10 @@ To resolve this issue, follow the steps below:
1. Navigate to ***Management Groups***
2. Select the management group (corresponding to the value entered for the *enterpriseScaleCompanyPrefix* during the deployment) were AMBA deployment was targeted to
3. Click ***Deployment***
4. Select all the deployments that could be deleted (example: instances of previous depoloyment related to AMBA) and click ***Delete***.
4. Select all the deployments that could be deleted (example: instances of previous deployment related to AMBA) and click ***Delete***.

{{< hint type=Note >}}
To recognize the deployment names belonging to AMBA, select those whose names start with:
To recognize the deployment names belonging to AMBA, select those deployments whose names start with:

1. amba-
2. pid-
Expand Down
10 changes: 6 additions & 4 deletions docs/content/patterns/alz/Moving-from-preview-to-GA.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,19 +40,21 @@ Follow the instructions below to download the cleanup script file. Alternatively
4. Sign in to the Azure with the `Connect-AzAccount` command. The account you sign in as needs to have permissions to remove Policy Assignments, Policy Definitions, and resources at the desired Management Group scope.
5. Execute the script using the option below

**Generate a list of the resource IDs which would be deleted by this script:**
{{% include "PowerShell-ExecutionPolicy.md" %}}

**Generate a list of the resource IDs which would be deleted by this script:**

```powershell
./Start-ALZMonitorCleanup.ps1 -ReportOnly
```

**Show output of what would happen if deletes executed:**
**Show output of what would happen if deletes executed:**

```powershell
./Start-ALZMonitorCleanup.ps1 -WhatIf
```

**Delete all resources deployed by the ALZ-Monitor IaC without prompting for confirmation:**
**Delete all resources deployed by the ALZ-Monitor IaC without prompting for confirmation:**

```powershell
./Start-ALZMonitorCleanup.ps1 -Force
Expand All @@ -64,4 +66,4 @@ Follow the instructions below to download the cleanup script file. Alternatively
- To deploy with GitHub Actions, please proceed with [Deploy with GitHub Actions](../deploy/Deploy-with-GitHub-Actions)
- To deploy with Azure DevOps Pipelines, please proceed with [Deploy with Azure Pipelines](../deploy/Deploy-with-Azure-Pipelines)
- To deploy with Azure CLI, please proceed with [Deploy with Azure CLI](../deploy/Deploy-with-Azure-CLI)
- To deploy with Azure PowerShell, please proceed with [Deploy with Azure PowerShell](../deploy/Deploy-with-Azure-PowerShell)
- To deploy with Azure PowerShell, please proceed with [Deploy with Azure PowerShell](../deploy/Deploy-with-Azure-PowerShell)
11 changes: 10 additions & 1 deletion docs/content/patterns/alz/Policy-Initiatives.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,4 +122,13 @@ This initiative is intended for assignment of policies relevant to service healt
| Deploy_activitylog_ServiceHealth_HealthAdvisory | [deploy-activitylog-ServiceHealth-Health.json](../../../services/Resources/subscriptions/Deploy-ActivityLog-ServiceHealth-Health.json) | deployIfNotExists |
| Deploy_activitylog_ServiceHealth_Incident | [deploy-activitylog-ServiceHealth-Incident.json](../../../services/Resources/subscriptions/Deploy-ActivityLog-ServiceHealth-Incident.json) | deployIfNotExists |
| Deploy_activitylog_ServiceHealth_Maintenance | [deploy-activitylog-ServiceHealth-Maintenance.json](../../../services/Resources/subscriptions/Deploy-ActivityLog-ServiceHealth-Maintenance.json) | deployIfNotExists |
| Deploy_AlertProcessing_Rule | [deploy-alertprocessingrule-deploy.json](../../../services/AlertsManagement/actionRules/Deploy-AlertProcessingRule-Deploy.json) | deployIfNotExists |
| Deploy_ServiceHealth_ActionGroups | [deploy-ServiceHealth-ActionGroups.json](../../../services/Resources/subscriptions/Deploy-ServiceHealth-ActionGroups.json) | deployIfNotExists |

## Notification Assets initiative

This initiative is intended for assignment of policies relevant to notification in ALZ. With the guidance provided in [Introduction to deploying the ALZ Pattern](../deploy/Introduction-to-deploying-the-ALZ-Pattern), this will assign to the alz intermediate root management group structure in the ALZ reference architecture. For details on which policies are included in the initiative as well as what the default enablement state of the policy is, refer to the below table.

| **Policy Display Name** | **Reference ID** | **Path to policy json file** | **Policy default effect** |
|----------|----------|----------|----------|
| Deploy AMBA Notification Assets | ALZ_AlertProcessing_Rule | [deploy-AlertProcessingRule-deploy.json](../../../services/AlertsManagement/actionRules/Deploy-AlertProcessingRule-Deploy.json) | deployIfNotExists |
| Deploy AMBA Notification Suppression Asset | ALZ_Suppression_AlertProcessing_Rule | [deploy-AlertProcessingRule-Suppression.json](../../../services/AlertsManagement/actionRules/Deploy-AlertProcessingRule-Suppression.json) | deployIfNotExists |
52 changes: 52 additions & 0 deletions docs/content/patterns/alz/Temporarily-disabling-notifications.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
title: Temporarily disabling notifications
geekdocCollapseSection: true
weight: 65
---

Azure Monitor alerts targeted to a large scope allow for at scale coverage, but reduce the flexibility to disable them for specific resources. There might be several reason to stop the notification of alerts. For instance, customers could have resources that are stopped or disabled due to maintenance or just want to stop the notification during the night shift. To allow this kind of flexibility, as part of the Notification Assets policy initiative, AMBA provides you with an asset to stop the notification for specific resources.

This asset is made of an alert processing rule (also known as APR) with the following characteristics:

- deployed as disabled
- scoped at the subscription level
- suppression rule type
- scheduled to run always

This APR needs to be configured with the resource ID of the resource(s) for which you want to stop notifications and then enabled every time you need it.

Once the resource is out of the maintenance period or when you don't need the suppression rule anymore, ***remember*** to remove the resources and disable the rule.

To know more about how to suppress notifications, see [Suppress notifications during planned maintenance](https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-processing-rules?tabs=portal#suppress-notifications-during-planned-maintenance)

To configure the APR, do the following:

1. In **Monitor --> Alerts**, click on **Alert processing rules**

![Monitor/Alerts/Alert processing rule](../media/AlertProcessingRules.png)

2. Click on the ARP named ***apr-AMBA-<mark>subscription display name</mark>-002*** with rule type **Suppression**

![Suppression aler processing rule](../media/SuppressionAlertProcessingRule.png)

3. Click on ***Edit***

![Edit alert processing rule](../media/Edit-AlertProcessingRule.png)

4. In the **Scope** tab, under the filter section, configure the following:

- Filters: ***Resource***
- Operator: ***Equals***
- Value: **Enter the <mark>resource Id</mark> of resources separated by comma <mark>with no spaces before, after or between the strings.</mark>**

![Configure filter](../media/Filter-AlertProcessingRule.png)

{{< hint type=Important >}}
Each filter can include up to **five** values. Should you need more than **5** resources, add more lines of filter.
{{< /hint >}}

5. Click on ***Review + save*** and then ***Save***

{{< hint type=Note >}}
It is possible to apply other types of filter. For a complete list of allowed scopes and filters, refer to the official [Scope and filters for alert processing rules](https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-processing-rules?tabs=portal#scope-and-filters-for-alert-processing-rules) documentation.
{{< /hint >}}
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Updating from release 2023-11-14 will require running a post update script to re

2. Execute the script using one of the options below:

{{% include "PowerShell-ExecutionPolicy.md" %}}

**Generate a list of the resource IDs which would be deleted by this script:**

```powershell
Expand Down
19 changes: 19 additions & 0 deletions docs/content/patterns/alz/deploy/PowerShell-ExecutionPolicy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
---

{{< hint type=Important >}}
Since PowerShell scripts released as part of the ALZ pattern are not digitally signed they might require you to _**temporarily**_ change the execution policy if not already set to _**Unrestricted**_. Before running the script, check the execution policy settings using this command:

```PowerShell
Get-ExecutionPolicy
```

If the result is everything but _**Unrestricted**_, run the following command to change it to **Unrestricted**

```PowerShell
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
```

At this point, you should be able to run your scripts with no issues. After you finished, you can set the execution policy back to what it was if you like to do so.

{{< /hint >}}
Loading

0 comments on commit e910399

Please sign in to comment.