Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(DMVP-5017): Added multiple policies #20

Merged
merged 1 commit into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,10 +235,10 @@ No resources.
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_alert_interval_seconds"></a> [alert\_interval\_seconds](#input\_alert\_interval\_seconds) | The interval, in seconds, at which all rules in the group are evaluated. If a group contains many rules, the rules are evaluated sequentially. | `number` | `10` | no |
| <a name="input_alert_rules"></a> [alert\_rules](#input\_alert\_rules) | This variable describes alert folders, groups and rules. | <pre>list(object({<br> name = string # The name of the alert rule<br> no_data_state = optional(string, "NoData") # Describes what state to enter when the rule's query returns No Data<br> exec_err_state = optional(string, "Error") # Describes what state to enter when the rule's query is invalid and the rule cannot be executed<br> summary = optional(string, "") # Rule annotation as a summary<br> priority = optional(string, "P2") # Rule priority level: P2 is for non-critical alerts, P1 will be set for critical alerts<br> folder_name = optional(string, "Main Alerts") # Grafana folder name in which the rule will be created<br> datasource = string # Name of the datasource used for the alert<br> expr = optional(string, null) # Full expression for the alert<br> metric_name = optional(string, "") # Prometheus metric name which queries the data for the alert<br> metric_function = optional(string, "") # Prometheus function used with metric for queries, like rate, sum etc.<br> metric_interval = optional(string, "") # The time interval with using functions like rate<br> settings_mode = optional(string, "replaceNN") # The mode used in B block, possible values are Strict, replaceNN, dropNN<br> settings_replaceWith = optional(number, 0) # The value by which NaN results of the query will be replaced<br> filters = optional(any, {}) # Filters object to identify each service for alerting<br> function = optional(string, "mean") # One of Reduce functions which will be used in B block for alerting<br> equation = string # The equation in the math expression which compares B blocks value with a number and generates an alert if needed. Possible values: gt, lt, gte, lte, e<br> threshold = number # The value against which B blocks are compared in the math expression<br> }))</pre> | `[]` | no |
| <a name="input_alert_rules"></a> [alert\_rules](#input\_alert\_rules) | This variable describes alert folders, groups and rules. | <pre>list(object({<br> name = string # The name of the alert rule<br> no_data_state = optional(string, "NoData") # Describes what state to enter when the rule's query returns No Data<br> exec_err_state = optional(string, "Error") # Describes what state to enter when the rule's query is invalid and the rule cannot be executed<br> summary = optional(string, "") # Rule annotation as a summary<br> labels = optional(map(any), { "priorityLow" : "true" })<br> folder_name = optional(string, "Main Alerts") # Grafana folder name in which the rule will be created<br> datasource = string # Name of the datasource used for the alert<br> expr = optional(string, null) # Full expression for the alert<br> metric_name = optional(string, "") # Prometheus metric name which queries the data for the alert<br> metric_function = optional(string, "") # Prometheus function used with metric for queries, like rate, sum etc.<br> metric_interval = optional(string, "") # The time interval with using functions like rate<br> settings_mode = optional(string, "replaceNN") # The mode used in B block, possible values are Strict, replaceNN, dropNN<br> settings_replaceWith = optional(number, 0) # The value by which NaN results of the query will be replaced<br> filters = optional(any, {}) # Filters object to identify each service for alerting<br> function = optional(string, "mean") # One of Reduce functions which will be used in B block for alerting<br> equation = string # The equation in the math expression which compares B blocks value with a number and generates an alert if needed. Possible values: gt, lt, gte, lte, e<br> threshold = number # The value against which B blocks are compared in the math expression<br> }))</pre> | `[]` | no |
| <a name="input_application_dashboard"></a> [application\_dashboard](#input\_application\_dashboard) | Dashboard for monitoring applications | <pre>object({<br> rows = optional(any, [])<br> data_source = object({ # global/default datasource, TODO: create datasource inside the module<br> uid = string<br> type = optional(string, "prometheus")<br> })<br> variables = optional(list(object({ # Allows to define variables to be used in dashboard<br> name = string<br> type = optional(string, "custom")<br> hide = optional(number, 0)<br> includeAll = optional(bool, false)<br> multi = optional(bool, false)<br> query = optional(string, "")<br> queryValue = optional(string, "")<br> skipUrlSync = optional(bool, false)<br> options = optional(list(object({<br> selected = optional(bool, false)<br> value = string<br> text = optional(string, null)<br> })), [])<br> })), [])<br> })</pre> | <pre>{<br> "data_source": null,<br> "rows": [],<br> "variables": []<br>}</pre> | no |
| <a name="input_name"></a> [name](#input\_name) | Dashboard name | `string` | n/a | yes |
| <a name="input_notifications"></a> [notifications](#input\_notifications) | Represents the configuration options for Grafana notification policies. | <pre>object({<br> contact_point = optional(string, "Slack") # The default contact point to route all unmatched notifications to.<br> group_by = optional(list(string), ["grafana_folder", "alertname"]) # A list of alert labels to group alerts into notifications by.<br> group_interval = optional(string, "5m") # Minimum time interval between two notifications for the same group.<br> repeat_interval = optional(string, "4h") # Minimum time interval for re-sending a notification if an alert is still firing.<br><br> policy = optional(object({<br> contact_point = optional(string, null) # The contact point to route notifications that match this rule to.<br> continue = optional(bool, false) # Whether to continue matching subsequent rules if an alert matches the current rule. Otherwise, the rule will be 'consumed' by the first policy to match it.<br> group_by = optional(list(string), [])<br> mute_timings = optional(list(string), []) # A list of mute timing names to apply to alerts that match this policy.<br><br> matcher = optional(object({<br> label = optional(string, "priority") # The name of the label to match against.<br> match = optional(string, "=") # The operator to apply when matching values of the given label. Allowed operators are = for equality, != for negated equality, =~ for regex equality, and !~ for negated regex equality.<br> value = optional(string, "P1") # The label value to match against.<br> }))<br> }))<br> })</pre> | `{}` | no |
| <a name="input_notifications"></a> [notifications](#input\_notifications) | Represents the configuration options for Grafana notification policies. | <pre>object({<br> contact_point = optional(string, "Slack") # The default contact point to route all unmatched notifications to.<br> group_by = optional(list(string), ["..."]) # A list of alert labels to group alerts into notifications by.<br> group_interval = optional(string, "5m") # Minimum time interval between two notifications for the same group.<br> repeat_interval = optional(string, "4h") # Minimum time interval for re-sending a notification if an alert is still firing.<br><br> policies = optional(list(object({<br> contact_point = optional(string, null) # The contact point to route notifications that match this rule to.<br> continue = optional(bool, false) # Whether to continue matching subsequent rules if an alert matches the current rule. Otherwise, the rule will be 'consumed' by the first policy to match it.<br> group_by = optional(list(string), ["..."])<br> mute_timings = optional(list(string), []) # A list of mute timing names to apply to alerts that match this policy.<br><br> matchers = optional(list(object({<br> label = optional(string, "priority") # The name of the label to match against.<br> match = optional(string, "=") # The operator to apply when matching values of the given label. Allowed operators are = for equality, != for negated equality, =~ for regex equality, and !~ for negated regex equality.<br> value = optional(string, "P1") # The label value to match against.<br> })), [])<br> })), [])<br> })</pre> | `{}` | no |
| <a name="input_opsgenie_endpoints"></a> [opsgenie\_endpoints](#input\_opsgenie\_endpoints) | OpsGenie contact points list. | <pre>list(object({<br> name = string # The name of the contact point.<br> api_key = string # The OpsGenie API key to use.<br> auto_close = optional(bool, false) # Whether to auto-close alerts in OpsGenie when they resolve in the Alertmanager.<br> message = optional(string, "") # The templated content of the message.<br> api_url = optional(string, "https://api.opsgenie.com/v2/alerts") # Allows customization of the OpsGenie API URL.<br> disable_resolve_message = optional(bool, false) # Whether to disable sending resolve messages.<br> }))</pre> | `[]` | no |
| <a name="input_slack_endpoints"></a> [slack\_endpoints](#input\_slack\_endpoints) | Slack contact points list. | <pre>list(object({<br> name = string # The name of the contact point.<br> endpoint_url = optional(string, "https://slack.com/api/chat.postMessage") # Use this to override the Slack API endpoint URL to send requests to.<br> icon_emoji = optional(string, "") # The name of a Slack workspace emoji to use as the bot icon.<br> icon_url = optional(string, "") # A URL of an image to use as the bot icon.<br> recipient = optional(string, null) # Channel, private group, or IM channel (can be an encoded ID or a name) to send messages to.<br> text = optional(string, "") # Templated content of the message.<br> title = optional(string, "") # Templated title of the message.<br> token = optional(string, "") # A Slack API token,for sending messages directly without the webhook method.<br> webhook_url = optional(string, "") # A Slack webhook URL,for sending messages via the webhook method.<br> username = optional(string, "") # Username for the bot to use.<br> disable_resolve_message = optional(bool, false) # Whether to disable sending resolve messages.<br> }))</pre> | `[]` | no |

Expand Down
4 changes: 4 additions & 0 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,16 @@ module "alerts" {
module "contact_points" {
source = "./modules/contact-points"

count = length(var.alert_rules) != 0 ? 1 : 0

slack_endpoints = var.slack_endpoints
opsgenie_endpoints = var.opsgenie_endpoints
}

module "notifications" {
source = "./modules/notifications"

count = length(var.alert_rules) != 0 ? 1 : 0

notifications = var.notifications
}
2 changes: 1 addition & 1 deletion modules/alerts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ No modules.
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_alert_interval_seconds"></a> [alert\_interval\_seconds](#input\_alert\_interval\_seconds) | The interval, in seconds, at which all rules in the group are evaluated. If a group contains many rules, the rules are evaluated sequentially. | `number` | `10` | no |
| <a name="input_alert_rules"></a> [alert\_rules](#input\_alert\_rules) | This varibale describes alert folders, groups and rules. | <pre>list(object({<br> name = string # The name of the alert rule<br> no_data_state = optional(string, "NoData") # Describes what state to enter when the rule's query returns No Data<br> exec_err_state = optional(string, "Error") # Describes what state to enter when the rule's query is invalid and the rule cannot be executed<br> summary = optional(string, "") # Rule annotation as a summary<br> priority = optional(string, "P2") # Rule priority level: P2 is for non-critical alerts, P1 will be set for critical alerts<br> folder_name = optional(string, "Main Alerts") # Grafana folder name in which the rule will be created<br> datasource = string # Name of the datasource used for the alert<br> expr = optional(string, null) # Full expression for the alert<br> metric_name = optional(string, "") # Prometheus metric name which queries the data for the alert<br> metric_function = optional(string, "") # Prometheus function used with metric for queries, like rate, sum etc.<br> metric_interval = optional(string, "") # The time interval with using functions like rate<br> settings_mode = optional(string, "replaceNN") # The mode used in B block, possible values are Strict, replaceNN, dropNN<br> settings_replaceWith = optional(number, 0) # The value by which NaN results of the query will be replaced<br> filters = optional(any, {}) # Filters object to identify each service for alerting<br> function = optional(string, "mean") # One of Reduce functions which will be used in B block for alerting<br> equation = string # The equation in the math expression which compares B blocks value with a number and generates an alert if needed. Possible values: gt, lt, gte, lte, e<br> threshold = number # The value against which B blocks are compared in the math expression<br> }))</pre> | `[]` | no |
| <a name="input_alert_rules"></a> [alert\_rules](#input\_alert\_rules) | This varibale describes alert folders, groups and rules. | <pre>list(object({<br> name = string # The name of the alert rule<br> no_data_state = optional(string, "NoData") # Describes what state to enter when the rule's query returns No Data<br> exec_err_state = optional(string, "Error") # Describes what state to enter when the rule's query is invalid and the rule cannot be executed<br> summary = optional(string, "") # Rule annotation as a summary<br> labels = optional(map(any), { "priorityLow" : "true" }) # Labels help to define where to send each alert<br> folder_name = optional(string, "Main Alerts") # Grafana folder name in which the rule will be created<br> datasource = string # Name of the datasource used for the alert<br> expr = optional(string, null) # Full expression for the alert<br> metric_name = optional(string, "") # Prometheus metric name which queries the data for the alert<br> metric_function = optional(string, "") # Prometheus function used with metric for queries, like rate, sum etc.<br> metric_interval = optional(string, "") # The time interval with using functions like rate<br> settings_mode = optional(string, "replaceNN") # The mode used in B block, possible values are Strict, replaceNN, dropNN<br> settings_replaceWith = optional(number, 0) # The value by which NaN results of the query will be replaced<br> filters = optional(any, {}) # Filters object to identify each service for alerting<br> function = optional(string, "mean") # One of Reduce functions which will be used in B block for alerting<br> equation = string # The equation in the math expression which compares B blocks value with a number and generates an alert if needed. Possible values: gt, lt, gte, lte, e<br> threshold = number # The value against which B blocks are compared in the math expression<br> }))</pre> | `[]` | no |

## Outputs

Expand Down
6 changes: 3 additions & 3 deletions modules/alerts/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ resource "grafana_rule_group" "alert_rule" {
"Managed By" = "Terraform"
"Summary" = lookup(rule.value, "summary", rule.value.name)
}
labels = {
"priority" = lookup(rule.value, "priority", "P2")
}
labels = lookup(rule.value, "labels", {
"priorityLow" : "true"
})
is_paused = false
data {
ref_id = "A"
Expand Down
Loading
Loading