-
Notifications
You must be signed in to change notification settings - Fork 27
Kepler Operator Design Discussion
Sunyanan Choochotkaew edited this page Sep 26, 2022
·
7 revisions
flowchart LR;
machine-config-->integrated-operator-install-->kepler-install
kepler-collected-metric
kepler-exported-power
node-selector: (default: all)
cgroup2:
enable: (default: true)
operations:
- deploy MachineConfigPool using node-selector (:warning: do we have to select node?)
- deploy cgroupv2 MachineConfig
prometheus:
grafana:
operations:
- install prometheus (https://github.com/prometheus-operator/prometheus-operator)
- install grafana (github.com/grafana-operator/grafana-operator)
- deploy datasource
⚠️ deploy dashboard-script?
scape-interval:
daemon:
exporter:
image:
port: (default: 9102)
estimator-sidecar:
enabled: (default: false)
image:
mnt-path: (default: /tmp)
model-server:
enabled: (default: :warning:false)
storage:
type: (default: local? , values: local, hostpath, nfs, external (such as via s3))
path: (default: models)
sampling-period:
operations:
- deploy model-server
- deploy model-server-service
- deploy rbac-related resources (serviceaccount, clusterrole, clusterrolebinding)
- deploy corresponding pv, pvc if hostpath or nfs
- deploy daemonset (w/wo estimator)
- deploy exporter-service
- deploy servicemonitor
spec:
counter:
cgroup:
kubelet:
gpu:
...
status:
counter:
cpu_cycles: enabled/disabled/unavailable
...
operations:
- apply metric configuration to kepler-ds env (:warning: or implement kepler-exporter to watch this CR)
spec:
node:
package:
pod:
status:
power-source:
node:
package:
pod:
...
operations:
- apply power configuration to kepler-ds env (:warning: or implement kepler-exporter to watch this CR) --> similar to collected-metric
- Single CR and Controller (merge metric and power to the kepler-ds and combine with other install config)
flowchart LR;
machine-config-->integrated-operator-install-->kepler-install-config
- Single install CR and Controller + metric CR and Controller + power CR and Controller
flowchart TD;
machine-config-->integrated-operator-install-->kepler-install
kepler-collected-metric
kepler-exported-power
- machine-config CR and Controller + integrated-operator-install CR and Controller + metric CR and Controller + power CR and Controller
flowchart LR;
machine-config;
integrated-operator-install;
kepler-collected-metric
kepler-exported-power
reconcile choice | advantage | disadvantage |
---|---|---|
single | single point of modify | frequently activate unnecessary logics on install, bad code readability |
single install + metric + power | separate infrequent change from relatively-frequent change, improved abstraction | |
all separated | clean logic, separates different functionality, most ideal abstraction |