feat: Update cloud-init customization #11

dlipovetsky · 2023-10-12T21:14:41Z

Description

Changes relative to upstream:

Add explanatory comments
Do not use stderr output of preKubeadmCommands indicate an error with bootstrapping

Changes relative to our fork:

Do not enable IPv6
Do not remove cloud-init logs and seed
Do not disable VMware customization
Do not disable network configuration
Do not truncate cloud-init-output.log
Do not report status of HTTP proxy configuration
Do not configure cloud-init to remove SSH keys on first boot
Remove commands that are already executed as a result of being defined in preKubeadmCommands

Changes relative to upstream: * Add explanatory comments * Do not use stderr output of preKubeadmCommands indicate an error with bootstrapping Changes relative to our fork: * Do not enable IPv6 * Do not remove cloud-init logs and seed * Do not disable VMware customization * Do not disable network configuration * Do not truncate cloud-init-output.log * Do not report status of HTTP proxy configuration * Do not configure cloud-init to remove SSH keys on first boot * Remove commands that are already executed as a result of being defined in `preKubeadmCommands`

controllers/cluster_scripts/cloud_init.tmpl

supershal · 2023-10-12T23:00:29Z

controllers/cluster_scripts/cloud_init.tmpl

 {{ if .ControlPlane }}
- '[ ! -f /run/kubeadm/konvoy-set-kube-proxy-configuration.sh] && sudo reboot'


were you able to boot the VM and create cluster after removing this file? I remember that pre kubeadm commands were failing If I removed them. I will test it out later to confirm

Good question!

You're right that any preKubeadmCommand that requires an ordinary file in /run will fail after a reboot, because ordinary files in /run do not persist across reboots. We recently (in https://github.com/mesosphere/konvoy2/pull/2337) moved all patch scripts from /run to /etc for this reason.

Your question made me wonder about the two reboot calls left in this template:

cluster-api-provider-cloud-director/controllers/cluster_scripts/cloud_init.tmpl

Lines 72 to 80 in 66ecf82

{{ if .ControlPlane }}

- '[ ! -f /root/control_plane.sh ] && sudo reboot'

- '[ ! -f /run/kubeadm/kubeadm.yaml ] && sudo reboot'

- bash /root/control_plane.sh

{{ else }}

- '[ ! -f /root/node.sh ] && sudo reboot'

- '[ ! -f /run/kubeadm/kubeadm-join-config.yaml ] && sudo reboot'

- bash /root/node.sh

{{ end }}

It seems harmless to reboot if the kubeadm config (/run/kubeadm/kubeadm.yaml or /run/kubeadm/kubeadm-join-config.yaml) is not not there (yet?).

But if we reboot because the bootstrap script (/root/control_plane.sh or /root/node.sh) is missing, and the kubeadm config happens to already be present, we will lose the kubeadm config after the reboot, leading to further reboots, without end.

At this time (66ecf82), I can successfully reboot either a control plane, or worker machine.

I think it may be better to remove the remaining reboot calls. I will experiment.

I've added a comment that explains why the reboot call is necessary. I've also moved these checks out to their own script, and use a separate log file to keep track.

This is what the log looks like:

# cat /var/log/capvcd/replace-userdata-files.log 2023-10-17 22:07:37 Checking for kubeadm configuration file 2023-10-17 22:07:37 kubeadm configuration file not found, cleaning cloud-init cache and rebooting 2023-10-17 22:08:12 Checking for kubeadm configuration file 2023-10-17 22:08:12 kubeadm configuration file found, exiting

Pretty clever. just iterating the logic: until the user-data is available, the vm will reboot and cloud-init will run it as if its first boot. once it is user-data are available, the bootstrap.sh will run kubeadm init/join.

until the user-data is available, the vm will reboot and cloud-init will run it as if its first boot. once it is user-data are available, the bootstrap.sh will run kubeadm init/join.

Correct. The upstream cloud-init also did this, but used in-line commands, instead of a script, and the reason behind everything wasn't given.

* Use shell script to clean cloud-init cache and reboot. * Fix error handling of bootstrap script. Do not interpret stderr output as an indicator of failure. Do not rely on trap and errexit, because it does not work for command lists. * Include last lines of output for error context. * Ensure we have an IPv4 address for localhost. * Remove unnecessary cloud-init configuration to preserve SSH host keys.

supershal

Thank you for testing this out.

dkoshkin

Thank you for all the comments and really digging deep into this! Great changes and so much simpler to understand.

* feat: Update cloud-init customization Changes relative to upstream: * Use shell script to clean cloud-init cache and reboot. * Fix error handling of bootstrap script. Do not interpret stderr output as an indicator of failure. Do not rely on trap and errexit, because it does not work for command lists. * Include last lines of output for error context. * Ensure we have an IPv4 address for localhost. * Remove unnecessary cloud-init configuration to preserve SSH host keys. Changes relative to our fork: * Do not remove cloud-init logs and seed on reboot * Do not truncate cloud-init-output.log on reboot * Do not report status of HTTP proxy configuration * Remove redundant commands (already executed as a result of being defined in `preKubeadmCommands`) * Do not disable VMware customization * Do not disable network configuration Signed-off-by: Daniel Lipovetsky <[email protected]>

dlipovetsky force-pushed the dlipovetsky/cloud-init branch 2 times, most recently from e024eb1 to 80126df Compare October 12, 2023 21:18

dlipovetsky force-pushed the dlipovetsky/cloud-init branch from 80126df to 66ecf82 Compare October 12, 2023 21:21

dlipovetsky requested review from dkoshkin and supershal October 12, 2023 21:21

supershal reviewed Oct 12, 2023

View reviewed changes

controllers/cluster_scripts/cloud_init.tmpl Show resolved Hide resolved

supershal reviewed Oct 12, 2023

View reviewed changes

dlipovetsky force-pushed the dlipovetsky/cloud-init branch from 6c7753d to d8316d1 Compare October 17, 2023 22:53

supershal approved these changes Oct 17, 2023

View reviewed changes

dkoshkin approved these changes Oct 18, 2023

View reviewed changes

dlipovetsky merged commit ac3388b into d2iq/release-1.1.0-1 Oct 18, 2023
1 check passed

dlipovetsky mentioned this pull request Mar 12, 2024

fix: Ensure localhost maps to IPv4 address #14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Update cloud-init customization #11

feat: Update cloud-init customization #11

dlipovetsky commented Oct 12, 2023 •

edited

Loading

supershal Oct 12, 2023

dlipovetsky Oct 13, 2023

dlipovetsky Oct 17, 2023

supershal Oct 17, 2023 •

edited

Loading

dlipovetsky Oct 18, 2023

supershal left a comment

dkoshkin left a comment

		{{ if .ControlPlane }}
		- '[ ! -f /run/kubeadm/konvoy-set-kube-proxy-configuration.sh] && sudo reboot'

	{{ if .ControlPlane }}
	- '[ ! -f /root/control_plane.sh ] && sudo reboot'
	- '[ ! -f /run/kubeadm/kubeadm.yaml ] && sudo reboot'
	- bash /root/control_plane.sh
	{{ else }}
	- '[ ! -f /root/node.sh ] && sudo reboot'
	- '[ ! -f /run/kubeadm/kubeadm-join-config.yaml ] && sudo reboot'
	- bash /root/node.sh
	{{ end }}

feat: Update cloud-init customization #11

feat: Update cloud-init customization #11

Conversation

dlipovetsky commented Oct 12, 2023 • edited Loading

Description

supershal Oct 12, 2023

Choose a reason for hiding this comment

dlipovetsky Oct 13, 2023

Choose a reason for hiding this comment

dlipovetsky Oct 17, 2023

Choose a reason for hiding this comment

supershal Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

dlipovetsky Oct 18, 2023

Choose a reason for hiding this comment

supershal left a comment

Choose a reason for hiding this comment

dkoshkin left a comment

Choose a reason for hiding this comment

dlipovetsky commented Oct 12, 2023 •

edited

Loading

supershal Oct 17, 2023 •

edited

Loading