Name	Name	Last commit message	Last commit date
parent directory ..
images	images
README.md	README.md

Self Healing with Keptn

About this use case

In this use case you will learn how to use the capabilities of Keptn to provide self-healing for an application without modifying any of the applications code. The use case presented in the following will scale up the pods of an application if the application undergoes heavy CPU saturation.

Prerequisites

A couple of specifications files are needed for Keptn to actually know which remediation to perofrm and to verify if the executed remediation was successful. These files have already put in your Docker container.

Configure keptn

In order for keptn to utilize Prometheus metrics to support self-healing, the configured Service Indicators, Service Objectives and Remediation steps need to be updated.

Add the needed resources to enable self-healing in your production environment:
```
keptn add-resource --project=sockshop --service=carts --stage=production --resource=service-indicators.yaml --resourceUri=service-indicators.yaml
```
```
keptn add-resource --project=sockshop --service=carts --stage=production --resource=service-objectives-prometheus-only.yaml --resourceUri=service-objectives.yaml
```
```
keptn add-resource --project=sockshop --service=carts --stage=production --resource=remediation.yaml --resourceUri=remediation.yaml
```
Learn more about those files here
- service-indicators.yaml: The service-indicators.yaml file specifies the indicators that can be used to describe service objectives. These indicators are metrics gathered from monitoring sources and they are defined by a query. The query to obtain the metric is source-specific.
- service-objectives.yaml: The service-objectives.yaml file specifies the service level objectives for one or more services. Therefore, this file first defines thresholds that express the fullfillment of the objectives by the pass and warning property. An evaluated objective that achives a score above the pass limit is considered to be fullfilled, between warning and pass it is in an acceptable range, and below warning it is not fullfilled. The objectives property lists all service level indicators by their metric name that are considered for this objective. Besides, each indicator is augmented by a threshold and timeframe. While the threshold defines the acceptance criteria of this indicator, the timeframe indicates the duration in which the metrics is evaluated. Finally, the score specifies the max number of points that can be achieved by this indicator.
- remediation.yaml: The remediation.yaml file defines remediation actions to execute in response to a problem related to the defined problem pattern / service objective. This action is interpreted by Keptn to trigger the proper remediation.
Configure Prometheus with the Keptn CLI
```
keptn configure monitoring prometheus --project=sockshop --service=carts
```
This will set up Prometheus as the monitoring solution used in this use case (please note that with Keptn 0.5.0 Dynatrace is not yet supported for this use case). In addition, Keptn configures Prometheus monitoring as well as the Prometheus Alert Manager to send out alerts in case of high CPU saturation.

Run the use case

Deploy and unhealthy version of the carts service

Tests can never capture all issues a service might undergo in a production environment. As we will see, our carts microservice is already running in our production environment hence it will cause some issues. This is due to the fact that real-user traffic is different than synthetic test traffic and we might not be able to test all real-user actions in our test phase.

Generate load for the service

In order to simulate user traffic that is causing an unhealthy behavior in the carts service, please execute the following script. This will add special items into the shopping cart that cause some extensive calculation.

Move to the correct folder:

cd /usr/keptn/examples/load-generation/bin

Execute the load generation program:
```
./loadgenerator-linux "http://carts.sockshop-production.$(kubectl get cm keptn-domain -n keptn -o=jsonpath='{.data.app_domain}')" cpu
```
This will constantly add items in the shopping cart and cause some CPU heavy calculations since due to the characteristics of the added items.

Watch self-healing in action

After approximately 15 minutes, the Prometheus Alert Manager will send out an alert since the service level objective is not met anymore.

We can verify if Keptn is triggering and upscale of the affected deployment by executing:

kubectl get deployments -n sockshop-production

We can see that the carts-primary deployment is now served by two pods.

NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
carts-db         1         1         1            1           37m
carts-primary    2         2         2            2           32m

In Prometheus, we can also verify the load and the triggered remediation action.

First we need a port-forward to access the internal Prometheus installation:
```
kubectl port-forward svc/prometheus-service -n monitoring 8080:8080
```
Now access Prometheus from your browser on http://localhost:8080

In the graph tab, add the following expression:
```
avg(rate(container_cpu_usage_seconds_total{namespace="sockshop-production",pod_name=~"carts-primary-.*"}[5m]))
```
Select the graph tab to see your CPU metrics of the carts-primary pods in the sockshop-production environment and you should see something similar:

After a couple of minutes we should able to see that the CPU usage is decreasing due to the scale up of the pods.
Verify self-healing in the Keptn's bridge.

In this example, the bridge shows that the remediation service triggered an update of the configuration of the carts service by increasing the number of replicas to 2. When the additional replica was available, the wait-service waited for three minutes for the remediation action to take effect. Afterwards, an evaluation by the pitometer-service was triggered to check if the remediation action resolved the problem. In this case, increasing the number of replicas achieved the desired effect, since the evaluation of the service level objectives has been successful.

Previous Step: Introducing quality gates ◀️ ▶️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04_Self_Healing

04_Self_Healing

README.md

Self Healing with Keptn

About this use case

Prerequisites

Configure keptn

Run the use case

Deploy and unhealthy version of the carts service

Generate load for the service

Watch self-healing in action

Files

04_Self_Healing

Directory actions

More options

Directory actions

More options

Latest commit

History

04_Self_Healing

Folders and files

parent directory

README.md

Self Healing with Keptn

About this use case

Prerequisites

Configure keptn

Run the use case

Deploy and unhealthy version of the carts service

Generate load for the service

Watch self-healing in action