Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Self sufficient/managed bundle image #638

Open
praveenkumar opened this issue Jan 6, 2023 · 15 comments
Open

[Feature] Self sufficient/managed bundle image #638

praveenkumar opened this issue Jan 6, 2023 · 15 comments
Assignees

Comments

@praveenkumar
Copy link
Member

praveenkumar commented Jan 6, 2023

As of now, once we start the VM using bundle image, crc perform following operation

  • Update the ssh key with new generated ssh key
  • set a password for core user for emergency login if configured by user
  • Disk resize (growing the parition)
  • stop ntp service based on a config
  • add user provided nameservers to the vm
  • mount shared dirs if enabled
  • change permission of root podman socket to 777
  • Setup crc-dnsmasq configuration and start crc-dnsmasq service
  • Start the kubelet service
  • renew certs if expired
  • configure proxy for the ocp cluster
  • ensure routes-controller pod is started/running
  • Update the user pull secret
  • Update the new ssh key as part of machine config
  • Update kubeadmin password if provided by user otherwise generate a random one
  • Update cluster ID
  • enable monitoring if requested
  • Wait for the cluster to be stable

For local there is some more steps like check the certs validation and let user know about it and then wait until it is recovered. But for a cloud image it might be better to have a single bash script which can perform all those action and we can create a unit file to running that script. It will help in case of a cloud image once deployed and that unit is enabled then at the end user will get the running cluster directly.

@cfergeau
Copy link
Contributor

cfergeau commented Jan 6, 2023

This does not necessarily have to be a bash script, this could also be something written in golang/C/rust/... if that's deemed more convenient than bash, or python/... if the interpreters are already installed on the image (but I don't think they are).

@praveenkumar
Copy link
Member Author

Bash script would be much simpler to do and can be part of same repo, if we want to do same with golang/C/rust then it should be a different repo and create binary which need to be included during bundle creation. I would prefer to do it in bash so there is no dependency and done soonish.

@cfergeau
Copy link
Contributor

cfergeau commented Jan 6, 2023

This is my point, this is a choice which is being made, not a hard requirement we cannot change. I'm not arguing against bash, I'm only pointing out we have alternatives if (hypothetically) we realized at some point bash is not a great fit for what we want to do.

@gbraad
Copy link
Collaborator

gbraad commented Jan 6, 2023

Consider how input can be given, as cloud-init will be used by some cloud providers

@adrianriobo
Copy link
Contributor

Just after talk to @praveenkumar add a note on one key benefit from adding the logic to the bundle is to self handle versioning support; this means if there are changes within a new OCP version and this requires the script to be adapted

...manage it externally could be tedious so the script should check the version and apply one logic or the other....

In other words adding it helps long term maintenance

@danpawlik
Copy link

@praveenkumar hey, what about moving the dnsmasq as dedicated service to the network manager dnsmasq plugin?

@praveenkumar
Copy link
Member Author

@praveenkumar hey, what about moving the dnsmasq as dedicated service to the network manager dnsmasq plugin?

We are now using the dnsmasq service directly instead of running it in the container. if we can have following config as part of dnsmasq plugin then sure it would be better to have it with NM.

here 192.168.130.11 is the vm IP address.

$ cat /etc/dnsmasq.d/crc-dnsmasq.conf 
listen-address=192.168.130.11
expand-hosts
log-queries
local=/crc.testing/
domain=crc.testing
address=/apps-crc.testing/192.168.130.11
address=/api.crc.testing/192.168.130.11
address=/api-int.crc.testing/192.168.130.11
address=/crc.crc.testing/192.168.126.11

@praveenkumar praveenkumar changed the title Self sufficient/managed bundle image [Feature] Self sufficient/managed bundle image Sep 10, 2024
@anjannath
Copy link
Member

Consider how input can be given, as cloud-init will be used by some cloud providers

One way could be to make the crc provisioner bash script/program support a config file, and run it after the cloud-init runs

once cloud-init finishes, it creates the config file crc provisioner tool expects and gets the needed inputs

@anjannath anjannath moved this from Todo to Work In Progress in Project planning: crc Oct 9, 2024
@anjannath
Copy link
Member

anjannath commented Oct 23, 2024

From the list in #638 (comment) we can categorize the various post bundle start tasks into two categories: tasks that depend on some kind of external data and those that don't depend on any external data

Tasks that don't depend on any external data/input:

  1. growing the filesystem (separate service for openshift and microshift as disk layout differs)
  2. creating the dnsmasq config (based on network mode)
  3. start routes controller pod (based on network mode)
  4. make root podman socket accesible (move to snc)
  5. approve CSR and wait for cert renewal in case of expired certs
  6. set random cluster ID
  7. wait for cluster stable
  8. rotate kubeadmin password

Tasks that need external data/input:

  1. SSH key rotation (needs the pub key)
  2. shared dir mounting (need to know the host os: win, mac or linux, and for windows needs the password)
  3. adding nameservers to the vm (needs the nameservers)
  4. updating the resolv.conf file on instance (needs the host resolve.conf values)
  5. adding pull secret to cluster (needs the pull secret)
  6. adding proxy configured on the cluster (needs values for http, https, no proxy and proxy CA)
  7. pull secret present in the cluster as secret/cm (needs the pull secret)
  8. set user provided kubeadmin password (needs kubeadmin password)

Tasks that need external data/input but not critical:

  1. setting a password for the core user (needs the password)
  2. setting ntp service off/on (needs the toggle value: on or off)
  3. setting vm clock to host clock (needs host time)
  4. enable if monitoring is to be enabled (needs the toggle value: enable or disable)

@praveenkumar
Copy link
Member Author

So some of the tasks which need external data might be not so critical (like blocker)

setting a password for the core user (needs the password)

This is for debugging purpose in case ssh connection is lost.

setting ntp service off/on (needs the toggle value: on or off)

This is also to test cert rotation

setting vm clock to host clock (needs host time)

I think this is also for test cert rotation? or is it something else?

enable if monitoring is to be enabled (needs the toggle value: enable or disable)

This is something user can do after cluster is running

@cfergeau
Copy link
Contributor

Being able to directly inject the generated ssh key would be useful for cfergeau/macadam#17

@anjannath
Copy link
Member

anjannath commented Nov 21, 2024

Being able to directly inject the generated ssh key would be useful for cfergeau/macadam#17

if we get ignition support then we could use ignition config to inject the ssh key on "first" boot, the current approach of using systemd units to configure the machine is dependent on the presence of certain files at hard-coded paths and its not working as the units start parallel and there's race between the files being present in the expected location vs the unit starting

if we have ignition support (i.e re-running ignition again when bundle is started by crc/crc-cloud) then this race condition would also be fixed as ignition runs very early and we'll also have a way to inject the SSH key

(we could also work around the units starting issue by having some loops on the scripts that the unit starts or systemd timers to re-trigger the units again later, if we go this route then we can add another systemd unit that adds the SSH pub key the same way)

@cfergeau
Copy link
Contributor

its not working as the units start parallel and there's race between the files being present in the expected location vs the unit starting

Does https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#ConditionPathExists= help with the race you are seeing?

@anjannath
Copy link
Member

Does https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#ConditionPathExists= help with the race you are seeing?

No, the unit still gets skipped, i tried to use ConditionPathExists in the following unit for adding the pull-secret

from crc we scp the file before starting the kubelet.service (since this unit should start after the kubelet.service) but it seems systemd decides if the service is to be skipped or not long before its actually started and unit is marked skipped due to the failing Condition...

[Unit]
Description=CRC Unit for adding pull secret to cluster
After=kubelet.service
Requires=kubelet.service
ConditionPathExists=/opt/crc/pull-secret

[Service]
Type=oneshot
ExecStart=/usr/local/bin/ocp-pullsecret.sh
StandardOutput=journal

[Install]
WantedBy=multi-user.target

@cfergeau
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Work In Progress
Status: In Progress
Development

No branches or pull requests

6 participants