Skip to content

Commit

Permalink
Merge pull request #96 from intelops/chandu
Browse files Browse the repository at this point in the history
kubviz doc updates by Alina
  • Loading branch information
devopstoday11 authored Apr 16, 2024
2 parents 59d0765 + 37312ba commit 2d9a18b
Show file tree
Hide file tree
Showing 2 changed files with 159 additions and 4 deletions.
4 changes: 2 additions & 2 deletions content/kubviz/1.0.0/8-security-tracking/_index.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ These reports will be available in JSON format, and you can visualize this data

You can customize the security scans by changing the chart values.

- To [Disable](https://github.com/intelops/kubviz/blob/main/charts/agent/values.yaml#L189) the cluster scan you can pass 0 or empty string
- To [Disable](https://github.com/intelops/kubviz/blob/main/charts/agent/values.yaml#L186) the cluster scan you can pass 0 or empty string

```yaml
schedule:
Expand All @@ -51,4 +51,4 @@ schedule:
...
```

Same you can change for [image-scan](https://github.com/intelops/kubviz/blob/main/charts/agent/values.yaml#L187) and [sbom](https://github.com/intelops/kubviz/blob/main/charts/agent/values.yaml#L188)
Same you can change for [image-scan](https://github.com/intelops/kubviz/blob/main/charts/agent/values.yaml#L184) and [sbom](https://github.com/intelops/kubviz/blob/main/charts/agent/values.yaml#L185)
159 changes: 157 additions & 2 deletions content/kubviz/1.0.0/9-health-check/_index.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,17 @@ You can run different types of checks against your Kubernetes cluster to detect

### Configuration

You'll need to [configure](https://github.com/intelops/kubviz/blob/main/charts/agent/values.yaml#L192) it to run health checks on your Kubernetes cluster.
All health checks are enabled by default upon installing the KubViz agent. They are automatically included, but if you don't need them, you can disable it.

You'll need to [configure](https://github.com/intelops/kubviz/blob/main/charts/agent/values.yaml#L189) it to run health checks on your Kubernetes cluster.

```yaml
kuberhealthy:
enabled: true
...
```

## Run Health Checks
## Types of Checks

Once you've configured it will start running health checks on your Kubernetes cluster. It supports a variety of health checks, The checks are:

Expand All @@ -33,3 +35,156 @@ Image pull check | Verifies that an image can be pulled from an image repository
Pod status check | Checks for unhealthy pod statuses in a target namespace |
Pod restart | Checks for excessive pod restarts in any namespace |
Resource quota check | Checks if resource quotas (CPU & memory) are available |


- Daemonset, Deployment, and DNS checks are enabled by default when you enabling kuberhealthy.

- Pod Status, Pod Restart, Image Pull, and Resource Quota checks need to be manually enabled.

### Daemonset Check

- **Purpose of Daemonset Check:** Validates the stable deployment and operation of daemonsets across all Kubernetes nodes, ensuring critical services are uniformly available.

- It automatically deploys a test daemonset, verifies pod scheduling on each node, and checks for successful pod termination upon completion. The check runs every 60 minutes.

### Deployment Check

- **Purpose of Deployment Check:** Assesses the success of application deployments in the Kubernetes cluster, ensuring configurations and services are correctly launched.

- Initiates a test deployment, evaluates the deployment process, service accessibility, and rollbacks if necessary, to ensure operational integrity.

### DNS Check

- **Purpose of DNS Check:** Ensures that DNS resolution is working correctly within the Kubernetes cluster, critical for service discovery and network communication.

- Performs DNS lookups to validate the responsiveness and accuracy of the cluster's DNS service, identifying potential issues early.

### Image Pull Check

- Image pull check is a custom check that requires manual enabling.

- This container tests the availability of image respositories.

- This check will run every 60 minutes. You can change this by modifying the `runInterval`.

```yaml
imagePullCheck:
enabled: true
runInterval: 60m
timeout: 1m
image:
repository: kuberhealthy/test-check
tag: v1.4.0
extraEnvs:
REPORT_FAILURE: "false"
REPORT_DELAY: "1s"
resources:
requests:
cpu: 10m
memory: 50Mi
...
```
#### Steps to Follow Before Running the Image Pull Check

1. Pull the test image from docker hub

```bash
docker pull kuberhealthy/test-check
```

2. Push this image on the repository you need tested.

```bash
docker push my.repository/repo/test-check
```

3. Replace the `repository` value with your repository.

- The pod is designed to attempt a pull of the test image from the remote repository (never from local). If the image is unavailable, an error will be reported to the API

### Pod Status Check

- Pod status check is a custom check that requires manual enabling.

- **Purpose of Pod Status Check:** Monitors the health and status of pods within the Kubernetes cluster to ensure they are running and stable.

- This check will run every 5 minutes. You can change this by modifying the `runInterval`.

```yaml
podStatus:
enabled: true
runInterval: 5m
timeout: 15m
image:
registry: docker.io
repository: kuberhealthy/pod-status-check
tag: v1.3.0
allNamespaces: true
extraEnvs: {}
nodeSelector: {}
tolerations: []
resources:
requests:
cpu: 10m
memory: 50Mi
...
```

### Pod Restart Check

- Pod restart check is a custom check that requires manual enabling.

- The Pod Restarts Check checks for excessive pod restarts in a given `POD_NAMESPACE`.

- The Pod Restarts Check deploys a pod that looks for pod resource events in a given `POD_NAMESPACE` and checks for Warning event types with reason `BackOff`. If this specific event type count exceeds the `MAX_FAILURES_ALLOWED`, an error is reporting back.

- The check runs every 5m (spec.runInterval) with a check timeout set to 10 minutes (spec.timeout), and a `MAX_FAILURES_ALLOWED` count set to 10. If the check does not complete within the given timeout it will report a timeout error.

```yaml
podRestarts:
enabled: true
runInterval: 5m
timeout: 10m
image:
registry: docker.io
repository: kuberhealthy/pod-restarts-check
tag: v2.5.0
allNamespaces: true
extraEnvs:
MAX_FAILURES_ALLOWED: "10"
nodeSelector: {}
tolerations: []
resources:
requests:
cpu: 10m
memory: 50Mi
...
```

### Resource Quota Check

- This check tests if namespace resource quotas CPU and memory are under a specified threshold or percentage. It requires manual enabling.

```yaml
resourceQuota:
enabled: true
runInterval: 1h
timeout: 2m
image:
repository: kuberhealthy/resource-quota-check
tag: v1.3.0
extraEnvs:
BLACKLIST: "default"
WHITELIST: "kube-system,kubviz"
resources:
requests:
cpu: 15m
memory: 15Mi
limits:
cpu: 30m
...
```

- **Configurable check environment variables:**
`BLACKLIST`: Blacklist of namespaces to look at (default for BLACKLIST=default)
`WHITELIST`: Whitelist of namespaces to look at. (default for whitelist=kube-system,kubviz)

0 comments on commit 2d9a18b

Please sign in to comment.