Update monitoring guide images & broken path (#168)

* Update > Guides > Monitoring > Images * Update broken path in docs/guides/hosting-guardrails/index.md & cleanup * Update image to be specific and cleanup * Content cleanup * Update > image and text for investigate event flood * Cleanup and broke docs/guides/hosting-guardrails/monitoring/investigate-event-flood/index.md single step#6 to multiple steps for clarity --------- Co-authored-by: raj <[email protected]>
turbot · Oct 18, 2024 · 7615a55 · 7615a55
1 parent b50af84
commit 7615a55
Show file tree

Hide file tree

Showing 20 changed files with 35 additions and 24 deletions.
diff --git a/docs/guides/hosting-guardrails/index.md b/docs/guides/hosting-guardrails/index.md
@@ -15,8 +15,8 @@ The Guardrails Enterprise installation is highly customizable, allowing you to d
 
 | |
 | - | -
-| [Architecture Guide](architecture) | Detailed logical, physiscal, network and application architecture information on hosting guardrails.
-| [Installation Guides](installation) | Guides to install Guardrails in your AWS account.
-| [Montoring Guides](monitoring) | How to proactivly monitor your Guardrails infrastructure.
-| [Recovery Guides](restore) | How to recover a Guardrails environment from backup.
-| [Troubleshooting Guides](troubleshooting) | How to assess and fix common hosting issues.
+| [Architecture Guide](guides/hosting-guardrails/architecture) | Detailed logical, physical, network and application architecture information on hosting guardrails.
+| [Installation Guides](guides/hosting-guardrails/installation) | Guides to install Guardrails in your AWS account.
+| [Monitoring Guides](guides/hosting-guardrails/monitoring) | How to proactively monitor your Guardrails infrastructure.
+| [Recovery Guides](guides/hosting-guardrails/restore) | How to recover a Guardrails environment from backup.
+| [Troubleshooting Guides](guides/hosting-guardrails/troubleshooting) | How to assess and fix common hosting issues.
diff --git a/...g-guardrails/monitoring/diagnose-control-error/cloudwatch-log-groups-select.png b/...g-guardrails/monitoring/diagnose-control-error/cloudwatch-log-groups-select.png
diff --git a/...drails/monitoring/diagnose-control-error/cloudwatch-loggroups-error-details.png b/...drails/monitoring/diagnose-control-error/cloudwatch-loggroups-error-details.png
diff --git a/.../monitoring/diagnose-control-error/cloudwatch-loggroups-search-with-errorid.png b/.../monitoring/diagnose-control-error/cloudwatch-loggroups-search-with-errorid.png
diff --git a/.../monitoring/diagnose-control-error/cloudwatch-select-search-all-log-streams.png b/.../monitoring/diagnose-control-error/cloudwatch-select-search-all-log-streams.png
diff --git a/...sting-guardrails/monitoring/diagnose-control-error/guardrails-control-error.png b/...sting-guardrails/monitoring/diagnose-control-error/guardrails-control-error.png
diff --git a/...uardrails/monitoring/diagnose-control-error/guardrails-expand-error-message.png b/...uardrails/monitoring/diagnose-control-error/guardrails-expand-error-message.png
diff --git a/docs/guides/hosting-guardrails/monitoring/diagnose-control-error/index.md b/docs/guides/hosting-guardrails/monitoring/diagnose-control-error/index.md
@@ -53,19 +53,22 @@ Choose **Log Groups** from the left navigation menu.
 
 ## Step 6: Search Log Group
 
-Search for log groups with the prefix **/aws/lambda/turbot_** followed by the workspace version.
+Search for log groups with a key word based on the workspace version received from [Step 3](#step-3-view-logs), this will render list of matching Log group names with the prefix `/aws/lambda/turbot_` followed by the workspace version
 
 ![Search Log Group](/images/docs/guardrails/guides/hosting-guardrails/monitoring/diagnose-control-error/cloudwatch-log-groups-select.png)
 
 ## Step 7: Select Log Group
 
-Select the **worker** log group as indicated in the **type** field from the error log in the Guardrails console. Choose **Search all log steams**.
+Select the worker log group as indicated in the type field from the error log in the Guardrails console.  E.g. select `/aws/lambda/turbot_5_47_2_rc_1_worker`. Choose **Search all log steams**.
 
 ![Worker Log Group](/images/docs/guardrails/guides/hosting-guardrails/monitoring/diagnose-control-error/cloudwatch-select-search-all-log-streams.png)
 
 ## Step 8: Search Error
 
-Search using the **errorId** retrieved from the Guardrails console control error log.
+Search using the `errorId` from [Step 3](#step-3-view-logs) from the Guardrails console control error log.
+
+> [!NOTE]
+> Ensure to provide the errorId in double quotes e.g. "3423432-dfdsf-3e331-fgdfgd234234"
 
 ![Search with Error Id](/images/docs/guardrails/guides/hosting-guardrails/monitoring/diagnose-control-error/cloudwatch-loggroups-search-with-errorid.png)
 

diff --git a/...onitoring/investigate-event-flood/cloudwatch-dashboard-events-queue-backlog.png b/...onitoring/investigate-event-flood/cloudwatch-dashboard-events-queue-backlog.png
diff --git a/...ils/monitoring/investigate-event-flood/cloudwatch-log-insights-event.source.png b/...ils/monitoring/investigate-event-flood/cloudwatch-log-insights-event.source.png
diff --git a/...onitoring/investigate-event-flood/cloudwatch-log-insights-events-by-account.png b/...onitoring/investigate-event-flood/cloudwatch-log-insights-events-by-account.png
diff --git a/...monitoring/investigate-event-flood/cloudwatch-log-insights-source-breakdown.png b/...monitoring/investigate-event-flood/cloudwatch-log-insights-source-breakdown.png
diff --git a/...sting-guardrails/monitoring/investigate-event-flood/cloudwatch-log-insights.png b/...sting-guardrails/monitoring/investigate-event-flood/cloudwatch-log-insights.png
diff --git a/...ls/monitoring/investigate-event-flood/cloudwatch-view-messages-by-workspace.png b/...ls/monitoring/investigate-event-flood/cloudwatch-view-messages-by-workspace.png
diff --git a/docs/guides/hosting-guardrails/monitoring/investigate-event-flood/index.md b/docs/guides/hosting-guardrails/monitoring/investigate-event-flood/index.md
@@ -31,33 +31,38 @@ Choose **Dashboards** from the left navigation menu.
 
 ## Step 3: Select Dashboard
 
-Select the Turbot Guardrails Enterprise (TE) CloudWatch dashboard, which is typically named after the TE version in use.
+In **Custom dashboards**, select the Turbot Guardrails Enterprise (TE) CloudWatch dashboard, which is typically named after the TE version in use.
 
 ![TE Dashboard](/images/docs/guardrails/guides/hosting-guardrails/monitoring/investigate-event-flood/cloudwatch-select-te-dashboard.png)
 
 ## Step 4: View Events Queue
 
-Select the duration and check the **Events Queue Backlog** graph in the TE CloudWatch dashboard that indicates the flood state.
+Select the desired duration from the time range option in the top-right corner, and check the **Events Queue Backlog** graph in the TE CloudWatch dashboard for spikes indicating a event flood state.
 
 ![Events Queue Backlog](/images/docs/guardrails/guides/hosting-guardrails/monitoring/investigate-event-flood/cloudwatch-dashboard-events-queue-backlog.png)
 
 ## Step 5: Identify Noisy Tenant
 
-In the **Activities** section of the TE Dashboard, use the **View All Messages By Workspace** widget to filter and identify the noisy tenant causing the issues.
+Scroll down in the same dashboard page to the **Activities** section, use the **View All Messages By Workspace** widget to filter and identify the noisy tenant causing the issues.
 The number of messages received by the top tenant over a specified duration, along with the difference between the top three tenants, can be a strong indicator of an event flood.
 
 ![View All Messages By Workspace](/images/docs/guardrails/guides/hosting-guardrails/monitoring/investigate-event-flood/cloudwatch-view-messages-by-workspace.png)
 
-## Step 6: Identify Cause
+## Step 6: Analyze Log Insights
 
-With the workspace identified, navigate to **CloudWatch > Logs Insights**, select the appropriate worker log group for the TE version and choose the desired query duration to proceed to investigate further by analyzing events, event sources, and account IDs for the workspace.
+With the workspace identified from the above step, navigate to **CloudWatch > Logs Insights**, select the appropriate worker log group for the TE version(s) and choose the desired query duration to proceed to investigate further by analyzing events, event sources, and account IDs for the workspace. This will render the query editor with the selected log group(s).
 
 > [!IMPORTANT]
 > Longer durations will increase the log group size and query time, which may result in higher billing costs for CloudWatch.
 
 ![View All Messages By Workspace](/images/docs/guardrails/guides/hosting-guardrails/monitoring/investigate-event-flood/cloudwatch-log-insights.png)
 
-Use this query to identify **External Messages by Accounts in a Tenant**.
+> [!NOTE]
+> You can select multiple TE version log groups if required.
+
+## Step 7: External Messages by Accounts in a Tenant
+
+In the query editor, use the below query to identify AWS `AccountId(s)` contributing to the events.
 
 ```
 fields @timestamp, @message
@@ -68,7 +73,9 @@ fields @timestamp, @message
 ```
 ![Accounts Generating Events](/images/docs/guardrails/guides/hosting-guardrails/monitoring/investigate-event-flood/cloudwatch-log-insights-events-by-account.png)
 
-Next, use this query to identify **External Messages by Source for a Tenant**.
+## Step 8: External Messages by Source for a Tenant
+
+Use below query to identify specific event `Source` from the different services.
 
 ```
 fields @timestamp, @message
@@ -80,7 +87,9 @@ fields @timestamp, @message
 
 ![Event Source](/images/docs/guardrails/guides/hosting-guardrails/monitoring/investigate-event-flood/cloudwatch-log-insights-event.source.png)
 
-Use this query to further identify the specific event name for the source.
+## Step 9: External Messages by Event Name
+
+Use below query to identify the specific `EventName` associated with the service.
 
 ```
 fields @timestamp, @message
@@ -90,10 +99,9 @@ fields @timestamp, @message
 | sort Count desc | limit 5
 
 ```
-
 ![Specific Event Name](/images/docs/guardrails/guides/hosting-guardrails/monitoring/investigate-event-flood/cloudwatch-log-insights-source-breakdown.png)
 
-## Step 7: Measures To Fix Event Flood
+## Step 10: Measures To Fix Event Flood
 
 **Isolate the Noisy Workspace:** As an immediate fix, move the noisy workspace to a separate TE version to prevent performance issues or throttling for neighboring workspaces.
 

diff --git a/...rdrails/monitoring/workspace-health-check/filter-policy-error-invalid-state.png b/...rdrails/monitoring/workspace-health-check/filter-policy-error-invalid-state.png
diff --git a/...uardrails/monitoring/workspace-health-check/guardrails-filter-error-invalid.png b/...uardrails/monitoring/workspace-health-check/guardrails-filter-error-invalid.png
diff --git a/...rdrails/monitoring/workspace-health-check/guardrails-policy-values-by-state.png b/...rdrails/monitoring/workspace-health-check/guardrails-policy-values-by-state.png
diff --git a/...rdrails/monitoring/workspace-health-check/guardrails-select-controls-alerts.png b/...rdrails/monitoring/workspace-health-check/guardrails-select-controls-alerts.png
diff --git a/docs/guides/hosting-guardrails/monitoring/workspace-health-check/index.md b/docs/guides/hosting-guardrails/monitoring/workspace-health-check/index.md
@@ -33,27 +33,27 @@ Under Controls, select **Alerts by Control Type**.
 
 ![Alerts by Control Type](/images/docs/guardrails/guides/hosting-guardrails/monitoring/workspace-health-check/guardrails-select-controls-alerts.png)
 
-Filter for **Error** and **Invalid** states.
+Select **Invalid** and **Error** From **State** filter dropdown.
 
 ![Apply Filter](/images/docs/guardrails/guides/hosting-guardrails/monitoring/workspace-health-check/guardrails-filter-error-invalid.png)
 
 ## Step 3: View Policy Alerts
 
-In **Reports**, under **Policies**, select **Policy Values by State**.
+In **Reports**, scroll down to `Policies` section, select **Policy Values by State** option.
 
 ![Alerts by Policy Values](/images/docs/guardrails/guides/hosting-guardrails/monitoring/workspace-health-check/guardrails-policy-values-by-state.png)
 
-Filter for **Error** and **Invalid** states.
+Select **Invalid** and **Error** From **State** filter dropdown.
 
 ![Apply Filter](/images/docs/guardrails/guides/hosting-guardrails/monitoring/workspace-health-check/filter-policy-error-invalid-state.png)
 
 ## Step 4: Resolving Errors and Optimizing Controls
 
-Review the controls and errors currently in an error state and take the necessary actions.
+*Review the controls and errors* currently in an error state and take the necessary actions.
 
-If the error is due to policy misconfiguration, carefully adjust the settings and apply the changes as required. Ensure that all configurations align with the workspace's needs to resolve the issue effectively.
+*If the error is due to policy misconfiguration*, carefully adjust the settings and apply the changes as required. Ensure that all configurations align with the workspace's needs to resolve the issue effectively.
 
-For product-related issues, make sure to document and report them for further investigation.
+*For product-related issues*, make sure to document and report them for further investigation.
 
 Additionally, to maintain efficiency, resources or controls that are not a priority should be skipped to reduce noise and wastage.