OXN - Observability eXperiment eNgine - is an extensible software framework to run observability experiments and compare observability design decisions. OXN follows the design principles of cloud benchmarking and strives towards portable and repeatable experiments. Experiments are defined as yaml-based configuration files, which allows them to be shared, versioned and repeated. OXN automates every step of the experiment process in a straightforward manner, from SUE setup to data collection, processing and reporting.
- Docker + Docker Compose
- Python >= v3.10
- Jupyter
-
Change to the forked demo submodule folder
cd opentelemetry-demo/
-
Build needed containers. This will take a while a while
make build
Alternativly, you can just build the container with fault injection, e.g., the recommender service. This may cause incompatability in the future.
docker compose build recommendationservice
-
Run docker compose to start the demo
docker compose up
-
Verify the demo application is working by visiting
http:localhost:8080/
for the Webstorehttp:localhost:8080/jaeger/ui
for Jaegerhttp:localhost:9090
for Prometheus
Note: oxn requires Python >= 3.10
-
Install virtualenv
pip install virtualenv
-
Create a virtualenv (named venv here)
virtualenv venv
-
Source the venv
source venv/bin/activate
-
Install oxn
pip install .
Note: oxn requires the pytables package, which in turn requires a set of dependencies.
- Verify that oxn is correctly installed
oxn --help
usage: oxn [-h] [--times TIMES] [--report REPORT] [--accounting] [--randomize] [--extend EXTEND] [--loglevel [{debug,info,warning,error,critical}]] [--logfile LOG_FILE] [--timeout TIMEOUT] spec
Observability experiments engine
positional arguments:
spec Path to an oxn experiment specification to execute.
options:
-h, --help show this help message and exit
--times TIMES Run the experiment n times. Default is 1
--report REPORT Create an experiment report at the specified location. If the file exists, it will be overwritten. If it does not exist, it will be created.
--accounting Capture resource usage for oxn and the sue. Requires that the report option is set.Will increase the time it takes to run the experiment by about two seconds for each service in the sue.
--randomize Randomize the treatment execution order. Per default, treatments are executed in the order given in the experiment specification
--extend EXTEND Path to a treatment extension file. If specified, treatments in the file will be loaded into oxn.
--loglevel [{debug,info,warning,error,critical}]
Set the log level. Choose between debug, info, warning, error, critical. Default is info
--logfile LOG_FILE Write logs to a file. If the file does not exist, it will be created.
--timeout TIMEOUT Timeout after which we stop trying to build the SUE. Default is 1m
- Run an experiment and write the experiment report to disk
oxn experiments/recommendation_pause_baseline.yml --report baseline_report.yml
The cluster provides Persistent Volume Claims (PVCs) to store data over multiple pod restarts. For this, the cluster makes use of OpenEBS in the default given config of OXN. Install OpenEBS with the following command:
kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml
You can also use other implementations of PVCs. Just make sure to change the values in the helm configs accordingly.
For the Prometheus and Grafana, we use the kube-prometheus-stack which deploys and configures Prometheus and Grafana in a ready to use state.
The following commands add the helm repository and install the kube-prometheus-stack in a specific namespace and apply custom configurations:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-prometheus prometheus-community/kube-prometheus-stack
--namespace oxn-external-monitoring
--create-namespace
--version 62.5.1
-f values_kube_prometheus.yaml
Kepler also provides a Helm chart. Therefore, the provision is straightforward. We follow the instructions from the official documentation and execute the following commands. The Command deploys Kepler in the correct namespace and applies custom changes.
helm repo add kepler https://sustainable-computing-io.github.io/kepler-helm-chart
helm repo update
helm install kepler kepler/kepler \
--namespace oxn-external-monitoring \
--create-namespace \
--set serviceMonitor.enabled=true \
--set serviceMonitor.labels.release=kube-prometheus \
-f values_kepler.yaml
There is a preconfigured dashboard for Grafana. The dashboard is deployed from the OXN repository using the command:
GF_POD=$(
kubectl get pod \
-n oxn-external-monitoring \
-l app.kubernetes.io/name=grafana \
-o jsonpath="{.items[0].metadata.name}"
)
kubectl cp kepler_dashboard.json oxn-external-monitoring/$GF_POD:/tmp/dashboards/kepler_dashboard.json
Deployment of the SUE follows also the instructions from the official documentation Deploy the SUE in an own namespace and apply a custom configuration file:
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm install astronomy-shop open-telemetry/opentelemetry-demo
--namespace system-under-evaluation
--create-namespace
-f values_opentelemetry_demo.yaml