Skip to content

Commit

Permalink
Merge pull request #10 from Brunoga-MS/main
Browse files Browse the repository at this point in the history
Updating noEmail with latest from Main
  • Loading branch information
Brunoga-MS authored Mar 8, 2024
2 parents 1475134 + bd2523a commit cce0134
Show file tree
Hide file tree
Showing 139 changed files with 5,463 additions and 379 deletions.
2 changes: 2 additions & 0 deletions config/_default/hugo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ enableRobotsTXT = true

ambaDevMode = false

ambaTelemetryPid = "pid-8bb7cf8a-bcf7-4264-abcb-703ace2fc84d"

# (Optional, default 6) Set how many table of contents levels to be showed on page.
# Use false to hide ToC, note that 0 will default to 6 (https://gohugo.io/functions/default/)
# You can also specify this parameter per page in front matter.
Expand Down
2 changes: 2 additions & 0 deletions config/test/hugo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ enableRobotsTXT = true

ambaDevMode = true

ambaTelemetryPid = "pid-8bb7cf8a-bcf7-4264-abcb-703ace2fc84d"

# (Optional, default 6) Set how many table of contents levels to be showed on page.
# Use false to hide ToC, note that 0 will default to 6 (https://gohugo.io/functions/default/)
# You can also specify this parameter per page in front matter.
Expand Down
14 changes: 7 additions & 7 deletions docs/content/contributing/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,17 +34,17 @@ The example folder structure below highlights all of the key assets that define
└── Deploy-VM-DataDiskReadLatency-Alert.json
```

**patterns:** *This folder contains assets for pattern/scenario specific guidance that leverages the baseline alerts in this repo. This contribute does not cover contributions to the patterns/services section. There will be specific guides within each pattern/service section.*
**patterns:** *This folder contains assets for pattern/scenario specific guidance that leverages the baseline alerts in this repo. This guide does not cover contributions to the patterns/scenarios section. There will be specific guides within each pattern/scenarios section.*

**services:** *This folder contains the baseline alert definitions, guidance, and example deployment scripts. It is grouped by resource category (e.g. Compute), and then by resource type (e.g. virtualMachines).*
**services:** *This folder contains the baseline alert definitions, guidance, and example deployment scripts. It is grouped by resource provider (e.g. Compute), and then by resource type (e.g. virtualMachines).*

{{< hint type=note >}}
You may need to add new resource category and/or resource type folders as you define new baseline alerts. These folders are case-sensitive and follow the naming conventions defined by the [Azure Resource Reference](https://learn.microsoft.com/azure/templates/) documentation. For example: Alert guidance for Microsoft.Compute/virtualMachines would go under 'services/Compute/virtualMachines'
You may need to add new resource provider and/or resource type folders as you define new baseline alerts. These folders are case-sensitive and follow the naming conventions defined by the [Azure Resource Reference](https://learn.microsoft.com/azure/templates/) documentation. For example: Alert guidance for Microsoft.Compute/virtualMachines would go under 'services/Compute/virtualMachines'
{{< /hint >}}

**_index.md:** *These files control the menu structure and the content layout for GitHub Pages site. There are only two versions of these files, one for the resource categories, which just controls the friendly name in the menu and title. The other version is at the resource type level and it controls the layout of the GitHub Pages site. As you create new folders, just copy the respective versions and change the title in the metadata section at the top of the file.*
**_index.md:** *These files control the menu structure and the content layout for GitHub Pages site. There are only two versions of these files, one for the resource providers, which just controls the friendly name in the menu and title. The other version is at the resource type level and it controls the layout of the GitHub Pages site. As you create new folders, just copy the respective versions and change the title in the metadata section at the top of the file.*

**alerts.yaml:** *This YAML-based file contains the detailed definition and guidance for the baseline alerts within each resource category/type folder. Below is the general structure of the file.*
**alerts.yaml:** *This YAML-based file contains the detailed definition and guidance for the baseline alerts within each resource provider/type folder. Below is the general structure of the file.*

```yaml
- name: <alert name>
Expand Down Expand Up @@ -96,7 +96,7 @@ Please note the following settings in the alert definition:
## Auto-Generated Alert Rules
A script was run to automatically generate alert rules based on top usage and settings trends. These rules have been added to their respective *alerts.yaml* files and have two tags associated with them: *auto-generated* and *agc-xxxx*. The *agc-xxxx* tag indicates the number of results found for that alert rule in the query used to analyze the top trends. This number should be used to evaluate the importance of including that alert rule as guidance in the repo. Once an auto-generated alert rule has been verified and updated with reference documentation, the *visible* property should be set to *true*. This will make the alert rule visible on the site. Resource categories and types that do not have visible alerts are currently hidden from the table of contents. To make those resource categories and types visible, edit their respective *_index.md* files and remove the *geekdocHidden: true* metadata from the top of the file.
A script was run to automatically generate alert rules based on top usage and settings trends. These rules have been added to their respective *alerts.yaml* files and have two tags associated with them: *auto-generated* and *agc-xxxx*. The *agc-xxxx* tag indicates the number of results found for that alert rule in the query used to analyze the top trends. This number should be used to evaluate the importance of including that alert rule as guidance in the repo. Once an auto-generated alert rule has been verified and updated with reference documentation, the *visible* property should be set to *true*. This will make the alert rule visible on the site. Resource providers and types that do not have visible alerts are currently hidden from the table of contents. To make those resource providers and types visible, edit their respective *_index.md* files and remove the *geekdocHidden: true* metadata from the top of the file.
## Context/Background
Expand Down Expand Up @@ -196,4 +196,4 @@ Once you have committed changes to your fork of the AMBA repo, you create a pull

1. Sometimes the local version of the website may show some inconsistencies that don't reflect the content you have created.

- If this happens, kill the Hugo local web server by pressing <kbd>CTRL</kbd>+<kbd>C</kbd> and then restart the Hugo web server by running `hugo server -D` from the root of the repo.
- If this happens, kill the Hugo local web server by pressing <kbd>CTRL</kbd>+<kbd>C</kbd> and then restart the Hugo web server by running `hugo server -D` from the root of the repo.
4 changes: 2 additions & 2 deletions docs/content/patterns/alz/Alerts-Details.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Specific alerts for ALZ can be downloaded by clicking on the Download icon (high

![Alert-Details Download icon](../media/AlertDetailsDownloadReference.png)

The best way to see which policy alert rules are part of the ALZ pattern it is best to go to the [Policy-Initiatives](docs/content/patterns/alz/Policy-Initiatives.md) page.
The best way to see which policy alert rules are part of the ALZ pattern it is best to go to the [Policy-Initiatives](../Policy-Initiatives) page.

The resources, metric alerts and their settings provide you with a starting point to help you address the following monitoring questions:
"What should we monitor in Azure?" and "What alert settings should we use?" While they are opinionated settings and they are meant to cover the most common Azure Landing Zone components, we encourage you to adjust these settings to suit your monitoring needs based on how you're using Azure.
Expand All @@ -29,7 +29,7 @@ We have tried to make it so that the table doesn't require a lot of side to side

{{< alzMetricAlerts >}}

<sup>1</sup> See "Why are the availability alert thresholds lower than 100% in this solution when the product group documention recommends 100%?" in the [FAQ](FAQ.md) for more details.
<sup>1</sup> See "Why are the availability alert thresholds lower than 100% in this solution when the product group documention recommends 100%?" in the [FAQ](../FAQ) for more details.

## Azure Landing Zone Activity Log Alerts

Expand Down
19 changes: 14 additions & 5 deletions docs/content/patterns/alz/Disabling-Policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,23 @@ geekdocCollapseSection: true
weight: 60
---

The policies in AMBA provide multiple methods to enable or disable the effects of the policy.
The policies in AMBA provide multiple methods to enable or disable the effects of the policy.

1. **Parameter: AlertState** - Determines the state of the alert rule. This either deploys an alert rule in a disabled state, or disables an already deployed alert rule at scale trough policy.
1. **Parameter: PolicyEffect** - Determines the effect of a Policy Definition, allowing a Policy to be deployed in a disabled state.
1. **Tag: MonitorDisable** - A tag that determines whether the resource should be evaluated. Allows you to exclude selected resources from monitoring.
2. **Parameter: PolicyEffect** - Determines the effect of a Policy Definition, allowing a Policy to be deployed in a disabled state.
3. **Tag: MonitorDisable** - A tag that determines whether the resource should be evaluated. Allows you to exclude selected resources from monitoring.

## AlertState parameter
Recognizing that it is not always possible to test alerts in a dev/test environment, we have introduced the AlertState parameter for all metric alerts (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and AlertState, for example VnetGwTunnelIngressAlertState). This is to address a scenario where an alert storm occurs and it is necessary to disable one or more alerts deployed via policies through a controlled process. This could be considered for a roll-back process as part of a change request.

Recognizing that it is not always possible to test alerts in a dev/test environment, we have introduced the AlertState parameter for all metric alerts (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and AlertState, for example VnetGwTunnelIngressAlertState). This is to address a scenario where an alert storm occurs and it is necessary to disable one or more alerts deployed via policies through a controlled process. This could be considered for a roll-back process as part of a change request.

### Allowed values

- "true" - Alert rule will be enabled. (Default)
- "false" - Alert rule will be disabled.

### How it works

The AlertState parameter is used for both compliance evaluation and configuration of the state of the alert rule. The value of the **AlertState** parameter is passed on to the **enabled** parameter which is part of the existenceCondition of the Policy.

```json
Expand Down Expand Up @@ -55,14 +59,17 @@ These are the high-level steps that would need to take place:
Note that the above approach will not delete the alerts objects in Azure, merely disable them. To delete the alerts you will have to do so manually. Also note that while you can engage the PolicyEffect to avoid deploying new alerts, you should not do so until you have successfully remediated the above. Otherwise the policy will be disabled, and you will not be able to turn alerts off via policy until that is changed back.

## PolicyEffect parameter

In general, we evaluate the alert rules on best practices, field experience, customer feedback, type of alert and possible impact. There are situations where disabling the policy makes sense to prevent receiving unnecessary and/ or duplicate alerts/ notifications. For example we deploy an alert rule for VPN Gateway Bandwidth Utilization, in turn we have disabled the alert rules for VPN Gateway Egress and Ingress.
The default is intended to provide a well balanced baseline. However you may want to Enable or Disable the creation of certain Alert rules to meet your needs.

### Allowed values

- "deployIfNotExists" - Policy will deploy the alert rule if the conditions are met. (Default for most Policies)
- "disabled" - The policy itself will be created but will not create the corresponding Alert rule.

### How it works

The PolicyEffect parameter is used for the configuration of the effect of the PolicyDefinition (in the initiatives and the example parameter file the parameter is named combining {resourceType}, {metricName} and PolicyEffect, for example ERCIRQoSDropBitsinPerSecPolicyEffect) . The value of the **PolicyEffect** parameter is passed on to the **effect** parameter which configures the effect of the Policy.

```json
Expand All @@ -84,9 +91,11 @@ The PolicyEffect parameter is used for the configuration of the effect of the Po
```

## MonitorDisable parameter

It´s also possible to exclude certain resources from being monitored. You may not want to monitor pre-production or dev environments. The MonitorDisable parameter contains the Tag name to determine whether a resource should be included. By default, creating the tag MonitorDisable with value "true" will prevent deployment of alert rules on those resources. This is easily adjusted to use existing tags, for example you could configure the parameter with the tag name "Environment" and tell it to deploy only if the tag value equals "prod", or when the tag isnt equal to "dev". Currently only the tag name is a parameter, other changes require minor changes in the code.

### How it works

The policyRule only continues if "allOff" is true. Meaning, the deployment will continue as long as the MonitorDisable tag doesn't exist or doesn't hold the value "true". When the tag holds "true", the "allOff" will return "false" as "notEquals": "true" is no longer satisfied, causing the deployment to stop

```json
Expand All @@ -103,4 +112,4 @@ The policyRule only continues if "allOff" is true. Meaning, the deployment will
}
]
}
```
```
20 changes: 10 additions & 10 deletions docs/content/patterns/alz/Known-Issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ The underlying data is not present in the Log Analytics table.

### Resolution

For VM Alerts please enable [VM Insights](Monitoring-and-Alerting#log-alerts).
For VM Alerts, enable [VM Insights](../Monitoring-and-Alerting#log-alerts).

## Failed to deploy because of role assignemnt issue
## Failed to deploy because of role assignment issue

Deployment of AMBA fails when there are orphaned role assignements.
Deployment of AMBA fails when there are orphaned role assignments.

### Error includes

Expand All @@ -31,7 +31,7 @@ Deployment of AMBA fails when there are orphaned role assignements.

### Cause

When a role or a role assignement is removed, some orphaned object can still appear, preventing a successful deployment.
When a role or a role assignment is removed, some orphaned object can still appear, preventing a successful deployment.

### Resolution

Expand All @@ -48,10 +48,10 @@ When a role or a role assignement is removed, some orphaned object can still app

### Cause

A deployment has been performed using one region, for example "uksouth", and when you try to deploy again to the same scope but to a different region you will receive an error. This happens even when a cleanup has been performed (see [Cleaning up a Deployment](../Cleaning-up-a-Deployment) for more details). This is because deployment entries still exists from the previous operation, so a region conflict is detected blocking you to run another deployment using a different region.
A deployment has been performed using one region, for example "uksouth", and when you try to deploy again to the same scope but to a different region you will receive an error. This happens even when a cleanup has been performed (see [Cleaning up a Deployment](../Cleaning-up-a-Deployment) for more details). This is because deployment entries still exist from the previous operation, so a region conflict is detected blocking you to run another deployment using a different region.

### Resolution
Situation 1: You are trying to deploy to a different region in addition to a previous deployment. Deploying to the same scope in a different region is not necessary. The definitions and assignments are scoped to a management group and are not region specific. No action is required.
Situation 1: You are trying to deploy to a region different from the one used in previous deployment. Deploying to the same scope in a different region is not necessary. The definitions and assignments are scoped to a management group and are not region-specific. No action is required.

Situation 2: You cleaned up a previous implementation and want to deploy again to a different region. To resolve this issue, follow the steps below:

Expand All @@ -61,7 +61,7 @@ Situation 2: You cleaned up a previous implementation and want to deploy again t
4. Select all the deployment instances related to AMBA and click ***Delete***.

{{< hint type=Note >}}
To recognize the deployment names belonging to AMBA, select those whose names start with:
To recognize the deployment names belonging to AMBA, select those deployments whose names start with:

1. amba-
2. pid-
Expand All @@ -76,7 +76,7 @@ If you deployed AMBA just one time, you have 14 deployment instances

### Error includes

*Error: Code=MultipleErrorsOccurred; Message=Multiple error occurred: Conflict,Conflict,Conflict,Conflict,Conflict,Conflict.*
*Error: Code=MultipleErrorsOccurred; Message=Multiple errors occurred: Conflict,Conflict,Conflict,Conflict,Conflict,Conflict.*

### Cause

Expand All @@ -88,10 +88,10 @@ To resolve this issue, follow the steps below:
1. Navigate to ***Management Groups***
2. Select the management group (corresponding to the value entered for the *enterpriseScaleCompanyPrefix* during the deployment) were AMBA deployment was targeted to
3. Click ***Deployment***
4. Select all the deployments that could be deleted (example: instances of previous depoloyment related to AMBA) and click ***Delete***.
4. Select all the deployments that could be deleted (example: instances of previous deployment related to AMBA) and click ***Delete***.

{{< hint type=Note >}}
To recognize the deployment names belonging to AMBA, select those whose names start with:
To recognize the deployment names belonging to AMBA, select those deployments whose names start with:

1. amba-
2. pid-
Expand Down
Loading

0 comments on commit cce0134

Please sign in to comment.