Skip to content

Commit

Permalink
Flatten & unify component structure
Browse files Browse the repository at this point in the history
This basically encompasses two logical changes:

1. More fully integrating neonvm into the rest of the repository; and
2. Using a consistent directory structure for all components

The new structure is:

- All components get their own top-level directory (e.g.
  'autoscaler-agent', 'neonvm-controller', 'vm-builder')
- Inside the directory, there is:
  - cmd/ - directory containing main.go for the component
  - Dockerfile - for building the component
  - *.yaml - various YAML files for deploying, if applicable

Additionally, all dependencies outside the 'cmd' package are moved into
the 'pkg/' subdirectory. In particular, there is now 'pkg/neonvm/'.

Also of note: 'neonvm/hack/kernel' has been relocated to the new
top-level directory 'neonvm-kernel/'.

Under this structure, the user-visible changes are:

1. Introduction of new YAML: neonvm-controller.yaml
2. Introduction of new YAML: neonvm-vxlan-controller.yaml
3. neonvm.yaml now is *mostly* just the CRD.

The primary reasons for this change are:

1. Because it's been long enough since we moved the neonvm repository
   here that we might as well make it more well-integrated.
2. Because as we look to add components in other languages, we need a
   directory structure that supports it.

... and now that Autoscaling is GA, we can make the bigger changes we
were holding off on :)
  • Loading branch information
sharnoff committed Sep 5, 2024
1 parent 2d6b957 commit 86fdd14
Show file tree
Hide file tree
Showing 90 changed files with 278 additions and 226 deletions.
12 changes: 6 additions & 6 deletions .github/workflows/build-images.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ jobs:
run: |
docker pull --quiet $IMAGE
ID=$(docker create $IMAGE true)
docker cp ${ID}:/vmlinuz neonvm/hack/kernel/vmlinuz
docker cp ${ID}:/vmlinuz neonvm-kernel/vmlinuz
docker rm -f ${ID}
- name: Build and push neonvm-runner image
Expand All @@ -179,7 +179,7 @@ jobs:
context: .
platforms: linux/amd64
push: true
file: neonvm/runner/Dockerfile
file: neonvm-runner/Dockerfile
cache-from: type=registry,ref=cache.neon.build/neonvm-runner:cache
cache-to: ${{ github.ref_name == 'main' && 'type=registry,ref=cache.neon.build/neonvm-runner:cache,mode=max' || '' }}
tags: ${{ needs.tags.outputs.runner }}
Expand All @@ -201,7 +201,7 @@ jobs:
context: .
platforms: linux/amd64
push: true
file: neonvm/Dockerfile
file: neonvm-controller/Dockerfile
cache-from: type=registry,ref=cache.neon.build/neonvm-controller:cache
cache-to: ${{ github.ref_name == 'main' && 'type=registry,ref=cache.neon.build/neonvm-controller:cache,mode=max' || '' }}
tags: ${{ needs.tags.outputs.controller }}
Expand All @@ -215,7 +215,7 @@ jobs:
context: .
platforms: linux/amd64
push: true
file: neonvm/tools/vxlan/Dockerfile
file: neonvm-vxlan-controller/Dockerfile
cache-from: type=registry,ref=cache.neon.build/neonvm-vxlan-controller:cache
cache-to: ${{ github.ref_name == 'main' && 'type=registry,ref=cache.neon.build/neonvm-vxlan-controller:cache,mode=max' || '' }}
tags: ${{ needs.tags.outputs.vxlan-controller }}
Expand All @@ -226,7 +226,7 @@ jobs:
context: .
platforms: linux/amd64
push: true
file: build/autoscale-scheduler/Dockerfile
file: autoscale-scheduler/Dockerfile
cache-from: type=registry,ref=cache.neon.build/autoscale-scheduler:cache
cache-to: ${{ github.ref_name == 'main' && 'type=registry,ref=cache.neon.build/autoscale-scheduler:cache,mode=max' || '' }}
tags: ${{ needs.tags.outputs.scheduler }}
Expand All @@ -239,7 +239,7 @@ jobs:
context: .
platforms: linux/amd64
push: true
file: build/autoscaler-agent/Dockerfile
file: autoscaler-agent/Dockerfile
cache-from: type=registry,ref=cache.neon.build/autoscaler-agent:cache
cache-to: ${{ github.ref_name == 'main' && 'type=registry,ref=cache.neon.build/autoscaler-agent:cache,mode=max' || '' }}
tags: ${{ needs.tags.outputs.autoscaler-agent }}
Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/e2e-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -174,9 +174,11 @@ jobs:
kubectl apply -f $(rendered neonvm-runner-image-loader.yaml)
kubectl -n neonvm-system rollout status daemonset neonvm-runner-image-loader
kubectl apply -f $(rendered neonvm.yaml)
kubectl -n neonvm-system rollout status daemonset neonvm-device-plugin
kubectl -n neonvm-system rollout status daemonset neonvm-vxlan-controller
kubectl -n neonvm-system rollout status daemonset neonvm-device-plugin
kubectl apply -f $(rendered neonvm-controller.yaml)
kubectl -n neonvm-system rollout status deployment neonvm-controller
kubectl apply -f $(rendered neonvm-vxlan-controller.yaml)
kubectl -n neonvm-system rollout status daemonset neonvm-vxlan-controller
kubectl apply -f $(rendered autoscale-scheduler.yaml)
kubectl -n kube-system rollout status deployment autoscale-scheduler
kubectl apply -f $(rendered autoscaler-agent.yaml)
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ jobs:
rendered_manifests/autoscale-scheduler.yaml
rendered_manifests/autoscaler-agent.yaml
rendered_manifests/neonvm.yaml
rendered_manifests/neonvm-controller.yaml
rendered_manifests/neonvm-vxlan-controller.yaml
rendered_manifests/neonvm-runner-image-loader.yaml
rendered_manifests/multus.yaml
rendered_manifests/multus-eks.yaml
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/vm-kernel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ jobs:
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
--method GET \
--field path=neonvm/hack/kernel \
--field path=neonvm-kernel \
--field sha=${COMMIT_SHA} \
--field per_page=1 \
--jq ".[0].sha" \
Expand Down Expand Up @@ -165,8 +165,8 @@ jobs:
- name: get kernel version
id: get-kernel-version
run: |
linux_config=$(ls neonvm/hack/kernel/linux-config-*) # returns something like "neonvm/hack/kernel/linux-config-6.1.63"
kernel_version=${linux_config##*-} # returns something like "6.1.63"
linux_config=$(ls neonvm-kernel/linux-config-*) # returns something like "neonvm-kernel/linux-config-6.1.63"
kernel_version=${linux_config##*-} # returns something like "6.1.63"
echo VM_KERNEL_VERSION=$kernel_version >> $GITHUB_OUTPUT
Expand All @@ -192,12 +192,12 @@ jobs:
uses: docker/build-push-action@v6
with:
build-args: KERNEL_VERSION=${{ steps.get-kernel-version.outputs.VM_KERNEL_VERSION }}
context: neonvm/hack/kernel
context: neonvm-kernel
platforms: linux/amd64
# Push kernel image only for scheduled builds or if workflow_dispatch/workflow_call input is true
push: true
pull: true
file: neonvm/hack/kernel/Dockerfile.kernel-builder
file: neonvm-kernel/Dockerfile.kernel-builder
cache-from: type=registry,ref=cache.neon.build/vm-kernel:cache
cache-to: ${{ github.ref_name == 'main' && 'type=registry,ref=cache.neon.build/vm-kernel:cache,mode=max' || '' }}
tags: ${{ steps.get-tags.outputs.tags }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ cover.html
*~

*.qcow2
# todo: remove old kernel location
neonvm/hack/kernel/vmlinuz

rendered_manifests
Expand Down
90 changes: 47 additions & 43 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,20 +36,25 @@ This isn't the only architecture document. You may also want to look at:

## High-level overview

At a high level, this repository provides two components:
At a high level, this repository provides five components with a non-trivial amount of code:

1. A modified Kubernetes scheduler (using the [plugin interface]) — known as "the (scheduler)
1. A Kubernetes custom resource definition (CRD) and controller (`neonvm-controller`) for managing
resizeable VMs — NeonVM.
2. The underlying NeonVM pods run `neonvm-runner`
3. NeonVM virtual machine images are built with `vm-builder`
4. A modified Kubernetes scheduler (using the [plugin interface]) — known as "the (scheduler)
plugin", `AutoscaleEnforcer`, `autoscale-scheduler`
2. A daemonset responsible for making VM scaling decisions & checking with interested parties
5. A daemonset responsible for making VM scaling decisions & checking with interested parties
— known as `autoscaler-agent` or simply `agent`

A third component, a binary running inside of the VM to (a) handle being upscaled
One last component, a binary running inside of the VM to (a) handle being upscaled
(b) validate that downscaling is ok, and (c) request immediate upscaling due to sharp changes in demand
— known as "the (VM) monitor", lives in
[`neondatabase/vm-monitor`](https://github.com/neondatabase/vm-monitor)
— known as "the (VM) monitor", lives in [`neondatabase/neon/.../vm-monitor`].

[plugin interface]: https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/

For information on NeonVM, see [README-NeonVM.md](./README-NeonVM.md).

The scheduler plugin is responsible for handling resource requests from the `autoscaler-agent`,
capping increases so that node resources aren't overcommitted.

Expand Down Expand Up @@ -81,48 +86,47 @@ discussed more in the [high-level consequences] section below.

## Repository structure

* `build/` — scripts for building the scheduler (`autoscale-scheduler`) and `autoscaler-agent`
* `cluster-autoscaler/` — patch and Dockerfile for building a NeonVM-compatible [cluster-autoscaler]
* `cmd/` — entrypoints for the `autoscaler-agent` and scheduler plugin. Very little
functionality implemented here. (See: `pkg/agent` and `pkg/plugin`)
* `deploy/` — YAML files used during cluster init. Of these, only the following two are manually
written:
* `deploy/autoscaler-agent.yaml`
* `deploy/autoscale-scheduler.yaml`
* `kind/` — files specific to creating our [kind](https://kind.sigs.k8s.io/) cluster
* `kind/config.yaml` — configuration for the kind cluster
* `neonvm/` — QEMU-based virtualisation API and controllers for k8s
* See [`neonvm/README.md`](./neonvm/README.md) for details
* `pkg/` — core go code from the scheduler plugin and `autoscaler-agent`. Where applicable, the
purpose of individual files is commented at the top.
* `pkg/agent/` — implementation of `autoscaler-agent`
* `pkg/api/` — all types for inter-component communications, plus some protocol-relevant types
independently used by multiple components.
* `pkg/billing/` — consumption metrics API, primarily used in
[`pkg/agent/billing.go`](pkg/agent/billing.go)
* `pkg/plugin/` — implementation of the scheduler plugin
* `pkg/util/` — miscellaneous utilities that are too general to be included in `agent` or
`plugin`.
* `scripts/` — a collection of scripts for common tasks. Items of note:
* `scripts/patch-*.json` — patches for testing live-updating of a VM or config
* `scripts/replace-scheduler.sh` — replaces the currently running scheduler, for quick redeploy
* `scripts/repeat-delete-scheduler.sh` — repeatedly deletes the scheduler (which will be
recreated by the deployment). For debugging.
* `scripts/run-bench.sh` — starts a CPU-intensive pgbench connected to a VM. Useful to watch
the TPS and get confirmation that autoscaled CPUs are being used.
* `scripts/scheduler-logs.sh` — convenience script to tail the scheduler's logs
* `scripts/ssh-into-vm.sh``ssh`es into a VM. Useful for debugging.
* `scripts/start-vm-bridge.sh`
* `tests/` — end-to-end tests
At a high level, each component gets its own directory and resulting YAML for its deployment, where
applicable.

These are:

* `autoscale-scheduler` — the scheduler (with our plugin)
* `autoscaler-agent`
* `cluster-autoscaler` — patch for building a NeonVM-compatible [cluster-autoscaler]
* `neonvm` — CRDs and other related YAMLs for NeonVM, alongside Go definitions and a generated
client. Note that the generated YAML includes a dependency on `neonvm-controller` via a webhook
for create/update/delete operations on the CRDs.
* `neonvm-controller` — controller for the NeonVM CRDs
* `neonvm-kernel` — files relating to the virtual machine kernel we use in NeonVM
* `neonvm-runner` — per-VM management process, created by `neonvm-controller`
* `neonvm-vxlan-controller`
* `vm-builder` — binary for building VM images for use by NeonVM

[cluster-autoscaler]: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

Each component directory contains:

* `cmd/` — the entrypoint of the component, containing `main.go`
* `Dockerfile` — for building the component, if applicable
* `kustomize.yaml` — if the component is separately deployed, instructions for how Kustomize should
generate the YAML
* `*.yaml` — if the component is separately deployed, there will be other YAML files as the
resources for Kustomize to include -- e.g. `daemonset.yaml` or `config_map.yaml`.

### Other directories

* `k3d/` and `kind/` — configuration for local test clusters
* `pkg/` — the bulk of the Go codebase. For more complex components, `cmd/main.go` often just calls
the relevant entrypoint function in its `pkg/` directory. `pkg/` also includes the common
packages shared by multiple components.
* `scripts` — a collection of scripts for common tasks
* `tests` — end-to-end tests
* `tests/e2e`[`kuttl`](https://kuttl.dev/) test scenarios itself
* `scripts-common.sh` — file with a handful of useful functions, used both in `build` and `scripts`
* `vm-deploy.yaml` — sample creation of a single VM, for testing autoscaling
* `vm-examples/` — collection of VMs:
* `pg16-disk-test/` — VM with Postgres 16 and and ssh access
* Refer to [`vm-examples/pg16-disk-test/README.md`](./vm-examples/pg16-disk-test) for more information.

[cluster-autoscaler]: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

## Agent-Scheduler protocol details

Broadly speaking, the `autoscaler-agent` _notifies_ on decrease and _requests_ increases. This means
Expand Down
Loading

0 comments on commit 86fdd14

Please sign in to comment.