Skip to content

Commit

Permalink
Updating documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Brunoga-MS committed Oct 2, 2024
1 parent 4a68771 commit 0775a86
Show file tree
Hide file tree
Showing 8 changed files with 148 additions and 96 deletions.
43 changes: 27 additions & 16 deletions docs/content/patterns/alz/Cleaning-up-a-Deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,64 +4,75 @@ geekdocCollapseSection: true
weight: 70
---

In some scenarios, it may be necessary to remove everything deployed by the AMBA solution. The instructions below detail execution of a PowerShell script to delete all resources deployed, including:
In some scenarios, it might be necessary to remove everything deployed by the AMBA-ALZ solution. The instructions below detail execution of a PowerShell script to delete all resources deployed, including:

- Metric Alerts
- Activity Log Alerts
- Resource Groups (created for to contain alert resources)
- Policy Assignments
- Policy Definitions
- Policy Set Definitions
- Policy Assignment remediation identity role assignments
- Action Groups
- Alert Processing Rules

All resources deployed as part of the initial AMBA deployment and the resources created dynamically by 'deploy if not exist' policies are either tagged, marked in metadata, or in description (depending on what the resource supports) with the value `_deployed_by_amba` or `_deployed_by_amba=True`. This metadata is used to execute the cleanup of deployed resources; _if it has been removed or modified the cleanup script will not include those resources_.
All resources deployed as part of the initial AMBA-ALZ deployment and the resources created dynamically by 'deploy if not exist' policies are either tagged, marked in metadata, or in description (depending on what the resource supports) with the value `_deployed_by_amba` or `_deployed_by_amba=True`. This metadata is used to execute the cleanup of deployed resources; _if it has been removed or modified the cleanup script won't include those resources_.

## Cleanup Script Execution

{{< hint type=Important >}}
It is highly recommended to **thoroughly** test the script before running on production environments. The sample scripts are not supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.
It's highly recommended to **thoroughly** test the script before running on production environments. The sample scripts aren't supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.
{{< /hint >}}

### Download the script file

Follow the instructions below to download the cleanup script file. Alternatively, clone the repo from GitHub and ensure you are working from the latest version of the file by fetching the latest `main` branch.
Execute the following instructions to download the cleanup script file. Alternatively, clone the repo from GitHub and ensure you're working from the latest version of the file by fetching the latest `main` branch.

1. Navigate AMBA [project in GitHub](https://github.com/Azure/azure-monitor-baseline-alerts)
1. Navigate the AMBA [project in GitHub](https://github.com/Azure/azure-monitor-baseline-alerts)
2. In the folder structure, browse to the `patterns/alz/scripts` directory
3. Open the **Start-AMBACleanup.ps1** script file
3. Open the **Start-AMBA-ALZ-Maintenance.ps1** script file
4. Click the **Raw** button
5. Save the open file as **Start-AMBACleanup.ps1**
5. Save the open file as **Start-AMBA-ALZ-Maintenance.ps1**

### Executing the Script

1. Open PowerShell
2. Install the **Az.ResourceGraph** module: `Install-Module Az.ResourceGraph`
3. Change directories to the location of the **Start-AMBACleanup.ps1** script
4. Configure the _**$pseudoRootManagementGroup**_ variable using the command below:
2. Make sure the following modules are installed:
1. **Az.Accounts**: if not installed, use the `Install-Module Az.Accounts` to install it
2. **Az.Resources**: if not installed, use the `Install-Module Az.Resources` to install it
3. **Az.ResourceGraph**: if not installed, use the `Install-Module Az.ResourceGraph` to install it
4. **Az.ManagedServiceIdentity**: if not installed, use the `Install-Module Az.ManagedServiceIdentity` to install it
3. Change directory to the location of the **Start-ALZ-Maintenance.ps1** script
4. Configure the _**$pseudoRootManagementGroup**_ variable using the following command:

```powershell
$pseudoRootManagementGroup = "The pseudo root management group id parenting the Platform and Landing Zones management groups"
```
5. Sign in to the Azure with the `Connect-AzAccount` command. The account you sign in as needs to have permissions to remove Policy Assignments, Policy Definitions, and resources at the desired Management Group scope.
6. Execute the script using one of the options below:
5. Sign in to Azure with the `Connect-AzAccount` command. The account you sign in with needs to have permissions to remove all the aforementioned resources (Policy Assignments, Policy Definitions, and other resources) at the desired Management Group scope.
6. Execute the script using one of the following options:
{{% include "PowerShell-ExecutionPolicy.md" %}}
**Get full help on script usage help:**
```powershell
Get-help ./Start-AMBA-ALZ-Maintenance.ps1
```
**Show output of what would happen if deletes executed:**
```powershell
./Start-AMBACleanup.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -WhatIf
./Start-AMBA-ALZ-Maintenance.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -cleanItems Amba-Alz -WhatIf
```
**Execute the script asking for confirmation before deleting the resources deployed by AMBA-ALZ:**
```powershell
./Start-AMBACleanup.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup
./Start-AMBA-ALZ-Maintenance.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -cleanItems Amba-Alz
```
**Execute the script <ins>without</ins> asking for confirmation before deleting the resources deployed by AMBA-ALZ.**
```powershell
./Start-AMBACleanup.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -Confirm:$false
./Start-AMBA-ALZ-Maintenance.ps1 -pseudoRootManagementGroup $pseudoRootManagementGroup -cleanItems Amba-Alz -Confirm:$false
```
26 changes: 13 additions & 13 deletions docs/content/patterns/alz/Disabling-Policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ geekdocCollapseSection: true
weight: 60
---

The policies in AMBA provide multiple methods to enable or disable the effects of the policy.
The policies included in AMBA-ALZ provide multiple methods to enable or disable the effects of the policy.

1. **Parameter: AlertState** - Determines the state of the alert rule. This either deploys an alert rule in a disabled state, or disables an already deployed alert rule at scale trough policy.
2. **Parameter: PolicyEffect** - Determines the effect of a Policy Definition, allowing a Policy to be deployed in a disabled state.
3. **Tag: MonitorDisable** - A tag that determines whether the resource should be evaluated. Allows you to exclude selected resources from monitoring.

## AlertState parameter

Recognizing that it is not always possible to test alerts in a dev/test environment, we have introduced the AlertState parameter for all metric alerts (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and AlertState, for example VnetGwTunnelIngressAlertState). This is to address a scenario where an alert storm occurs and it is necessary to disable one or more alerts deployed via policies through a controlled process. This could be considered for a roll-back process as part of a change request.
Recognizing that it isn't always possible to test alerts in a dev/test environment, we have introduced the AlertState parameter for all metric alerts (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and AlertState, for example VnetGwTunnelIngressAlertState). This is to address a scenario where an alert storm occurs and it's necessary to disable one or more alerts deployed via policies through a controlled process. This could be considered for a roll-back process as part of a change request.

### Allowed values

Expand Down Expand Up @@ -53,24 +53,24 @@ If "allOf" evaluates to true, the effect is satisfied and doesn't trigger the de
These are the high-level steps that would need to take place:

1. Change the value for the AlertState parameter for the offending policies to false, either via command line or parameter file as described previously.
1. Deploy the policies and assignments as described previously.
1. After deploying and policy evaluation there will be a number of non-compliant policies depending on which alerts were to be disabled. These will then need to be remediated which can be done either through the portal, on a policy-by-policy basis or you can run the script found in [patterns/alz/scripts/Start-AMBARemediation](https://github.com/Azure/azure-monitor-baseline-alerts/blob/main/patterns/alz/scripts/Start-AMBARemediation.ps1) to remediate all ALZ-Monitor policies in scope as defined by management group pre-fix.
2. Deploy the policies and assignments as described previously.
3. Once the deployment is completed successfully and the policy evaluation is finished, there will be many non-compliant policies and resources depending on which alerts were to be disabled. These non-compliant resources need to be remediated which can be done either through the portal, on a policy-by-policy basis or you can run the script found in [patterns/alz/scripts/Start-AMBARemediation](https://github.com/Azure/azure-monitor-baseline-alerts/blob/main/patterns/alz/scripts/Start-AMBARemediation.ps1) to remediate all ALZ-Monitor policies in scope as defined by management group pre-fix.

Note that the above approach will not delete the alerts objects in Azure, merely disable them. To delete the alerts you will have to do so manually. Also note that while you can engage the PolicyEffect to avoid deploying new alerts, you should not do so until you have successfully remediated the above. Otherwise the policy will be disabled, and you will not be able to turn alerts off via policy until that is changed back.
Note that the preceding approach won't delete the alerts objects in Azure, merely disable them. To delete the alerts, you will have to do so manually. Also note that while you can engage the PolicyEffect to avoid deploying new alerts, you shouldn't do so until you have successfully remediated what was mentioned earlier. Otherwise the policy will be disabled, and you won't be able to turn off alerts via policy until that is changed back.

## PolicyEffect parameter

In general, we evaluate the alert rules on best practices, field experience, customer feedback, type of alert and possible impact. There are situations where disabling the policy makes sense to prevent receiving unnecessary and/ or duplicate alerts/ notifications. For example we deploy an alert rule for VPN Gateway Bandwidth Utilization, in turn we have disabled the alert rules for VPN Gateway Egress and Ingress.
The default is intended to provide a well balanced baseline. However you may want to Enable or Disable the creation of certain Alert rules to meet your needs.
The default is intended to provide a well-balanced baseline. However you may want to Enable or Disable the creation of certain Alert rules to meet your needs.

### Allowed values

- "deployIfNotExists" - Policy will deploy the alert rule if the conditions are met. (Default for most Policies)
- "disabled" - The policy itself will be created but will not create the corresponding Alert rule.
- "disabled" - The policy itself will be created but won't create the corresponding Alert rule.

### How it works

The PolicyEffect parameter is used for the configuration of the effect of the PolicyDefinition (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and PolicyEffect, for example ERCIRQoSDropBitsinPerSecPolicyEffect) . The value of the **PolicyEffect** parameter is passed on to the **effect** parameter which configures the effect of the Policy.
The PolicyEffect parameter is used for the configuration of the effect of the PolicyDefinition (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and PolicyEffect, for example ERCIRQoSDropBitsinPerSecPolicyEffect). The value of the **PolicyEffect** parameter is passed on to the **effect** parameter which configures the effect of the Policy.

```json
"policyRule": {
Expand All @@ -96,11 +96,11 @@ It´s also possible to exclude certain resources from being monitored. You may n

![MonitorDisable* parameters](../media/MonitorDisableParams.png)

This will deploy policy definitions which will only be evaluated and remediated if the tag value(s) are not included in the list you provided.
This will deploy policy definitions which will only be evaluated and remediated if the tag value(s) aren't included in the list you provided.

### How it works

The policyRule only continues if "allOff" is true. Meaning, the deployment will continue as long as the MonitorDisableTagName tag doesn't exist or doesn't hold the any of the values listed in the MonitorDisableTagValues parameter. When the tag holds one of the configured values, the "allOff" will return "false" as *"notIn": "[[parameters('MonitorDisableTagValues')]"* is no longer satisfied, causing the evaluation and hence the remediation to stop.
The policyRule only continues if "allOff" is true. Meaning, the deployment will continue as long as the MonitorDisableTagName tag doesn't exist or doesn't hold any of the values listed in the MonitorDisableTagValues parameter. When the tag holds one of the configured values, the "allOff" will return "false" as *"notIn": "[[parameters('MonitorDisableTagValues')]"* is no longer satisfied, causing the evaluation and hence the remediation to stop.

```json
"policyRule": {
Expand All @@ -118,7 +118,7 @@ The policyRule only continues if "allOff" is true. Meaning, the deployment will
},
```

Given the different resource scope that this method can be applied to, we made it working a little bit different when it comes to log-based alerts. For instance, the virtual machine alerts are scoped to subscription and tagging the subcription would result in disabling all the policies targeted at it.
For this reason, and thanks to the new **Bring Your Own User Assigned Managed Identity (BYO UAMI)*** included in the [2024-06-05](../../Whats-New#2024-06-05) release and to the ability to query Azure resource Graph using Azure Monitor (see [Quickstart: Create alerts with Azure Resource Graph and Log Analytics](https://learn.microsoft.com/en-us/azure/governance/resource-graph/alerts-query-quickstart?tabs=azure-resource-graph)), it is now possible to disable individual alerts for both Azure and hybrid virtual machines after they are created. We got requests to stop alerting fro virtual machines that were off for maintenance and this enhancement came up just in time.
Given the different resource scope that this method can be applied to, we made it working slightly different when it comes to log-based alerts. For instance, the virtual machine alerts are scoped to subscription and tagging the subscription would result in disabling all the policies targeted at it.
For this reason, and thanks to the new **Bring Your Own User Assigned Managed Identity (BYO UAMI)*** included in the [2024-06-05](../../Whats-New#2024-06-05) release and to the ability to query Azure resource Graph using Azure Monitor (see [Quickstart: Create alerts with Azure Resource Graph and Log Analytics](https://learn.microsoft.com/en-us/azure/governance/resource-graph/alerts-query-quickstart?tabs=azure-resource-graph)), it's now possible to disable individual alerts for both Azure and hybrid virtual machines after they're created. We got requests to stop alerting fro virtual machines that were off for maintenance and this enhancement came up just in time.

Should you need to disable the alerts for your virtual machines after they are created, just make sure you tag the relevant resources accordingly. The alert queries have been modified to look at resource properties in [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview). If the resource contains the given tag name and tag value, it is made part of an exclusion list, so alerts will not be generated for them. This behavior allows you to dinamically and rapidly exclude the necessary resources from being alerted without the need of deleteing the alert, tag the resource and run the remediation again.
Should you need to disable the alerts for your virtual machines after they're created, just make sure you tag the relevant resources accordingly. The alert queries have been modified to look at resource properties in [Azure Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview). If the resource contains the given tag name and tag value, it's made part of an exclusion list, so alerts won't be generated for them. This behavior allows you to dynamically and rapidly exclude the necessary resources from being alerted without the need of deleting the alert, tag the resource and run the remediation again.
Loading

0 comments on commit 0775a86

Please sign in to comment.