Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TMP] Robot Integration #544

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 63 additions & 3 deletions .github/workflows/test_e2e.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
name: Run e2e tests
name: e2e tests
on:
pull_request: {}
push:
branches: [main]
jobs:
k3s:
name: k3s ${{ matrix.k3s }}
cloud:
name: Cloud ${{ matrix.k3s }}
permissions:
id-token: write
runs-on: ubuntu-latest
Expand Down Expand Up @@ -49,4 +49,64 @@ jobs:
skaffold build --tag="e2e-${GITHUB_RUN_ID}-${GITHUB_RUN_NUMBER}"
tag=$(skaffold build --tag="e2e-${GITHUB_RUN_ID}-${GITHUB_RUN_NUMBER}" --quiet --output="{{ (index .Builds 0).Tag }}")
skaffold deploy --images=hetznercloud/hcloud-cloud-controller-manager=$tag

go test ./tests/e2e -tags e2e -v -timeout 60m

robot:
name: Robot
permissions:
id-token: write

# Make sure that only one Job is using the server at a time
concurrency: robot-test-server
environment: e2e-robot

runs-on: ubuntu-latest
steps:
- uses: actions/setup-go@v4
with:
go-version: "1.21"
- uses: actions/checkout@master
- uses: hetznercloud/tps-action@main
with:
token: ${{ secrets.HCLOUD_TOKEN }}
- uses: 3bit/setup-hcloud@v2
- uses: yokawasa/[email protected]
with:
setup-tools: |
helm
kubectl
skaffold
helm: v3.11.2
kubectl: v1.28.1
skaffold: v2.3.0

- name: Run tests
env:
K3S_CHANNEL: v1.28
SCOPE: gha-${{ github.run_id }}-${{ github.run_attempt }}-robot

# Disable routes in dev-env, not supported for Robot.
ROUTES_ENABLED: "false"

ROBOT_USER_NAME: ${{ secrets.ROBOT_USER_NAME }}
ROBOT_PASSWORD: ${{ secrets.ROBOT_PASSWORD }}
SERVER_NUMBER: ${{ vars.SERVER_NUMBER }}
run: |
curl -sLS https://get.k3sup.dev | sh

trap "hack/dev-down.sh" EXIT
source <(hack/dev-up.sh)

skaffold build --tag="e2e-${GITHUB_RUN_ID}-${GITHUB_RUN_NUMBER}"
tag=$(skaffold build --tag="e2e-${GITHUB_RUN_ID}-${GITHUB_RUN_NUMBER}" --quiet --output="{{ (index .Builds 0).Tag }}")
skaffold deploy \
--profile=robot \
--images=hetznercloud/hcloud-cloud-controller-manager=$tag

pushd hack/robot-e2e
ansible-galaxy install -r requirements.yml
ansible-playbook e2e-setup-robot-server.yml -e scope=$SCOPE -e server_number=$SERVER_NUMBER -vvv
popd

go test ./tests/e2e -tags e2e,robot -v -timeout 60m
6 changes: 6 additions & 0 deletions .golangci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ linters-settings:
- pkg: k8s.io/apimachinery/pkg/apis/meta/v1
alias: metav1

- pkg: github.com/syself/hrobot-go
alias: hrobot
- pkg: github.com/syself/hrobot-go/models
alias: robotmodels

misspell:
locale: "US"

Expand Down Expand Up @@ -58,3 +63,4 @@ issues:
- path: internal/mocks
linters:
- unparam
- revive
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@

[![GitHub Actions status](https://github.com/hetznercloud/hcloud-cloud-controller-manager/workflows/Run%20tests/badge.svg)](https://github.com/hetznercloud/hcloud-cloud-controller-manager/actions)

The Hetzner Cloud controller manager integrates your Kubernetes cluster with the Hetzner Cloud API.
The Hetzner Cloud [cloud-controller-manager](#TODO) integrates your Kubernetes cluster with the Hetzner Cloud & Robot APIs.

## Features

> TODO Rework, use actual Controller Names or The functionality, Zones is outdated

* **instances interface**: adds the server type to the `node.kubernetes.io/instance-type` label, sets the external ipv4 and ipv6 addresses and deletes nodes from Kubernetes that were deleted from the Hetzner Cloud.
* **zones interface**: makes Kubernetes aware of the failure domain of the server by setting the `topology.kubernetes.io/region` and `topology.kubernetes.io/zone` labels on the node.
* **Private Networks**: allows to use Hetzner Cloud Private Networks for your pods traffic.
Expand Down Expand Up @@ -254,6 +256,15 @@ alias kgp="kubectl get pods"
alias kgs="kubectl get services"
```

The test suite is split in three parts:

- **General Part**: Sets up the test env & checks if the HCCM Pod is properly running
- Build Tag: `e2e`
- **Cloud Part**: Tests regular functionality against a Cloud-only environment
- Build Tag: `e2e && !robot`
- **Robot Part**: Tests Robot functionality against a Cloud+Robot environment
- Build Tag: `e2e && robot`

## Local test setup
This repository provides [skaffold](https://skaffold.dev/) to easily deploy / debug this controller on demand

Expand Down
14 changes: 14 additions & 0 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,20 @@ env:
name: hcloud
key: token

# TODO: get field names from existing hetzner-ccm deployment
ROBOT_USER_NAME:
valueFrom:
secretKeyRef:
name: hcloud
key: robot-user
optional: true
ROBOT_PASSWORD:
valueFrom:
secretKeyRef:
name: hcloud
key: robot-password
optional: true

image:
repository: hetznercloud/hcloud-cloud-controller-manager
tag: '{{ $.Chart.Version }}'
Expand Down
12 changes: 12 additions & 0 deletions deploy/ccm-networks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,18 @@ spec:
secretKeyRef:
key: token
name: hcloud
- name: ROBOT_PASSWORD
valueFrom:
secretKeyRef:
key: robot-password
name: hcloud
optional: true
- name: ROBOT_USER_NAME
valueFrom:
secretKeyRef:
key: robot-user
name: hcloud
optional: true
- name: HCLOUD_NETWORK
valueFrom:
secretKeyRef:
Expand Down
12 changes: 12 additions & 0 deletions deploy/ccm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,18 @@ spec:
secretKeyRef:
key: token
name: hcloud
- name: ROBOT_PASSWORD
valueFrom:
secretKeyRef:
key: robot-password
name: hcloud
optional: true
- name: ROBOT_USER_NAME
valueFrom:
secretKeyRef:
key: robot-user
name: hcloud
optional: true
image: hetznercloud/hcloud-cloud-controller-manager:v1.18.0 # x-release-please-version
ports:
- name: metrics
Expand Down
48 changes: 48 additions & 0 deletions docs/robot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Clusters with Robot Servers

## Features

Most of the features we support for Cloud servers are also supported for Robot servers:

### Node Controller

The Node controller adds some information about the server to the Node object. This includes:

- `TODO Annotations`

### Node Lifecycle Controller

The Node Lifecycle Controller is responsible for updating the shutdown status of Nodes & deleting the Kubernetes Node object if the corresponding server is removed.

Both are generally supported. The shutdown status can only be detected if the Robot Server supports this.

### Service Controller (Load Balancers)

The service controller watches Services with `type: LoadBalancer` and creates Cloud Load Balancers for them. By default, all Kubernetes Nodes including Robot servers are added as targets to the Load Balancer. Check out the [Load Balancer Documentation](./load_balancers.md) for more details.

### Unsupported

#### Routes & Private Networks

Adding support for Routing Pod CIDRs through the (Cloud) Networks & (Robot) vSwitches is not currently supported. You will need to use your own CNI for this.

If you are interested in this, we are looking for contributors to help design & implement this.

## Requirements

### Identifying the correct Server

When a new Node joins the cluster, we first need to figure out which Robot (or Cloud) Server matches this node. We primarily try to match this through the Node Name & the Name of the server in Robot. If you use Kubeadm, the Node Name by default is the Hostname of the server.

_This means that by default, your **Hostname** needs to be the same of the **name of the server in Robot**_. If this does not match, we can not properly match the two entities. Once we have made this connection, we save the Robot Server Number to the field `spec.providerId` on the Node, and use this identifier for any further processing.

If you absolutely need to use different names in Robot & Hostname, you can also configure the Provider ID yourself. With Kubeadm you can set the flag `TODO` to specify it manually. You need to follow the format `hrobot://$SERVER_NUMBER` when setting this. If this format is not followed exactly we can not process this node.

## Config Options

##

## Migrating from syself/hetzner-cloud-controller-manager

If you have previously used the Hetzner Cloud Controller Manager by Syself, you can migrate to hcloud-cloud-controller-manager. We have tried to keep the configuration & features mostly the same and backwards compatible, but you need to make the following changes:

1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ require (
github.com/prometheus/client_golang v1.17.0
github.com/spf13/pflag v1.0.5
github.com/stretchr/testify v1.8.4
github.com/syself/hrobot-go v0.2.5
k8s.io/api v0.28.3
k8s.io/apimachinery v0.28.3
k8s.io/client-go v0.28.3
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,8 @@ github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk=
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
github.com/syself/hrobot-go v0.2.5 h1:Zs7GDFRd6fDn4YHYE9e5CGtRm6KYmMZwMMnm7OC/09g=
github.com/syself/hrobot-go v0.2.5/go.mod h1:Oy47yZs+fJKcSh38S3OiNJdY34MXb0pkk796UnpYBnc=
github.com/tmc/grpc-websocket-proxy v0.0.0-20220101234140-673ab2c3ae75 h1:6fotK7otjonDflCTK0BCfls4SPy3NcCVb5dqqmbRknE=
github.com/tmc/grpc-websocket-proxy v0.0.0-20220101234140-673ab2c3ae75/go.mod h1:KO6IkyS8Y3j8OdNO85qEYBsRPuteD+YciPomcXdrMnk=
github.com/xiang90/probing v0.0.0-20190116061207-43a291ad63a2 h1:eY9dn8+vbi4tKz5Qo6v2eYzo7kUS51QINcR5jNpbZS8=
Expand Down
14 changes: 13 additions & 1 deletion hack/dev-up.sh
Original file line number Diff line number Diff line change
Expand Up @@ -164,16 +164,28 @@ if [[ -n "${DEBUG:-}" ]]; then set -x; fi
# Create HCLOUD_TOKEN Secret for hcloud-cloud-controller-manager.
( trap error ERR
if ! kubectl -n kube-system get secret hcloud >/dev/null 2>&1; then
kubectl -n kube-system create secret generic hcloud --from-literal="token=$HCLOUD_TOKEN" --from-literal="network=$scope_name"
data=(
--from-literal="token=$HCLOUD_TOKEN"
--from-literal="network=$scope_name"
)
if [[ -v ROBOT_USER_NAME ]]; then
data+=(
--from-literal="robot-user=$ROBOT_USER_NAME"
--from-literal="robot-password=$ROBOT_PASSWORD"
)
fi
kubectl -n kube-system create secret generic hcloud "${data[@]}"
fi) &
wait
) &
wait
echo "Success - cluster fully initialized and ready, why not see for yourself?"
echo '$ kubectl get nodes'
kubectl get nodes
export CONTROL_IP=$(hcloud server ip "$scope_name-1")
} >&2

echo "export KUBECONFIG=$KUBECONFIG"
$SCRIPT_DIR/registry-port-forward.sh
echo "export SKAFFOLD_DEFAULT_REPO=localhost:30666"
echo "export CONTROL_IP=$CONTROL_IP"
7 changes: 7 additions & 0 deletions hack/robot-e2e/ansible.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[defaults]
inventory = ${PWD}/inventory.yml
forks = 10
host_key_checking = False

[ssh_connection]
# pipelining = True
6 changes: 6 additions & 0 deletions hack/robot-e2e/autosetup.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
HOSTNAME {{ server_name }}

DRIVE1 /dev/sda
PART / ext4 all

IMAGE /root/.oldroot/nfs/images/Ubuntu-2204-jammy-amd64-base.tar.gz
96 changes: 96 additions & 0 deletions hack/robot-e2e/e2e-setup-robot-server.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
- name: Prepare Reinstall
hosts: localhost
connection: local
gather_facts: false

vars:
scope: dev
# Additional SSH keys to add to the server for debugging. Must already exist in Robot.
authorized_keys: []

module_defaults:
group/community.hrobot.robot:
hetzner_user: "{{ lookup('ansible.builtin.env', 'ROBOT_USER_NAME') }}"
hetzner_password: "{{ lookup('ansible.builtin.env', 'ROBOT_PASSWORD') }}"

tasks:
- name: Get Server Info
community.hrobot.server_info:
server_number: "{{ server_number }}"
register: server_info

- name: Set Server Facts
ansible.builtin.set_fact: server_ip="{{ server_info.servers[0].server_ip }}" server_name="{{ server_info.servers[0].server_name }}"

- name: Create SSH Key
community.hrobot.ssh_key:
name: "hccm-{{ scope }}"
public_key: "{{ lookup('file', '../.ssh-{{ scope }}.pub') }}"
state: present
register: ssh_key

- name: Enable Rescue System
community.hrobot.boot:
server_number: "{{ server_number }}"
rescue:
authorized_keys: "{{ authorized_keys + [ ssh_key.fingerprint ] }}"
os: linux

- name: Reset Server (to get to Rescue System)
community.hrobot.reset:
server_number: "{{ server_number }}"
reset_type: hardware # only type that does not require a separate reset for starting again

- name: Wait for SSH
ansible.builtin.wait_for:
host: "{{ server_ip }}"
port: "{{ 22 }}"
search_regex: SSH

- name: Install OS to Server
hosts: all
gather_facts: false
tasks:
- name: Write autosetup
ansible.builtin.template:
src: autosetup.j2
dest: /autosetup
vars:
server_name: "{{ hostvars['localhost']['server_name'] }}"

- name: installimage
# -t => Take over rescue system SSH public keys
ansible.builtin.command: /root/.oldroot/nfs/install/installimage -t yes

- name: Reboot
ansible.builtin.reboot:

- name: Create k3s directory
ansible.builtin.file:
path: /etc/rancher/k3s
state: directory

- name: Prepare Local Registry
ansible.builtin.copy:
src: ../k3s-registries.yaml
dest: /etc/rancher/k3s/registries.yaml

- name: Join Kubernetes Cluster
hosts: localhost
connection: local
gather_facts: false
vars:
control_ip: "{{ lookup('ansible.builtin.env', 'CONTROL_IP') }}"
k3s_channel: stable
scope: dev

tasks:
- name: k3sup
ansible.builtin.command: >-
k3sup join
--server-ip={{ control_ip | ansible.builtin.mandatory }}
--ip={{ server_ip }}
--k3s-channel={{ k3s_channel }}
--k3s-extra-args="--kubelet-arg cloud-provider=external --node-label instance.hetzner.cloud/is-root-server=true"
--ssh-key ../.ssh-{{ scope }}
Loading
Loading