Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add resources and controllers for bare metal infra provider #707

Merged

Conversation

utkuozdemir
Copy link
Member

@utkuozdemir utkuozdemir commented Oct 25, 2024

Introduce the concept of "static" infra providers, e.g., bare-metal infra provider, which manage a static set of machines contrary to the "regular" infra providers.

Add the following resources:

  • infra.Machine: similar to MachineRequest, lives in the infra-provider namespace, serving as the input of the owning static provider. It is created in the MachineController if there is a SideroLink connection with the static provider ID. Regular flow of Machine creation is blocked, until this infra.Machine is accepted.

  • infra.MachineStatus: similar to MachineRequestStatus, lives in the infra-provider namespace, serving as the output of the owning static provider. Its lifecycle must be bound to the corresponding infra.Machine.

  • infra.MachineState: a resource that is supposed to be shared by Omni and bare-metal provider bi-directionally - they both can read from and write to it. It is currently used to mark the machine as installed when we observe an installation (through SequenceEvents in the event sink), and to mark it as non-installed after we wipe it in the provider.

  • omni.InfraMachineConfig: a user-managed resource to mark the infra.Machines as accepted or set their desired power state. The acceptance information is then propagated to the infra.Machine resource. A machine which was already accepted cannot be unaccepted (checked by a validation), and this resource can only be removed when the siderolink.Link for the matching infra machine is removed.

@utkuozdemir utkuozdemir force-pushed the bare-metal-provider-connection branch 3 times, most recently from 880d275 to dd2e6ad Compare October 29, 2024 21:35
@utkuozdemir utkuozdemir force-pushed the bare-metal-provider-connection branch 3 times, most recently from d5fb13e to a14b056 Compare November 8, 2024 23:52
@utkuozdemir utkuozdemir force-pushed the bare-metal-provider-connection branch 5 times, most recently from b4b296a to e02a76b Compare November 13, 2024 13:51
@utkuozdemir utkuozdemir changed the title WIP: feat: bare metal stuff feat: add resources and controllers for bare metal infra provider Nov 13, 2024
@utkuozdemir utkuozdemir marked this pull request as ready for review November 13, 2024 13:53
@utkuozdemir utkuozdemir force-pushed the bare-metal-provider-connection branch 5 times, most recently from c2bb7f8 to fd438ee Compare November 20, 2024 09:46
@utkuozdemir utkuozdemir force-pushed the bare-metal-provider-connection branch 6 times, most recently from c753e9a to 8a6f925 Compare November 25, 2024 22:14
Comment on lines +101 to +106
if _, isStaticProvider := providerStatus.Metadata().Labels().Get(omni.LabelIsStaticInfraProvider); isStaticProvider { // static infra provider flow, e.g., bare-metal
return h.handleStaticInfraProvider(ctx, r, machine)
}

// provisioning (regular) infra provider flow, e.g., kubevirt
return h.handleProvisioningInfraProvider(ctx, r, link, machine)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than this branching here, the logic in this file is totally unchanged - just reworked for readability.

@@ -249,7 +249,7 @@ func (ctrl *MachineStatusController) reconcileRunning(ctx context.Context, r con

spec.Connected = connected

helpers.CopyLabels(machine, m, omni.LabelMachineRequest, omni.LabelMachineRequestSet, omni.LabelNoManualAllocation)
helpers.CopyLabels(machine, m, omni.LabelMachineRequest, omni.LabelMachineRequestSet, omni.LabelNoManualAllocation, omni.LabelIsManagedByStaticInfraProvider)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagate this label from Machine to MachineStatus. We then check this label on MachineStatus in ClusterMachineConfigStatusController to find out if the machine is being prepared by the static infra provider (still not ready to be used, e.g., being wiped), and if it is, skip applying config to it.

// handleProvisioningInfraProvider checks if the machine managed by a provisioning infra provider has the machine request and machine request set labels set.
//
// It then matches the machine request set owner to the machine provision controller, and if it matches, sets the no manual allocation label on the machine.
func (h *machineControllerHelper) handleProvisioningInfraProvider(ctx context.Context, r controller.Reader, link *siderolink.Link, machine *omni.Machine) error {
Copy link
Member Author

@utkuozdemir utkuozdemir Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better naming suggestions are welcome - my thought process here was: we have 2 distinct types of infra providers: "static", e.g., bare-metal, and "provisioning", e.g. kubevirt, aws etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to not be confusing: I simply this from machinestatus.Handler to be machineevent.Handler, as it now also handles SequenceEvents. GitHub couldn't figure out that it was moved and also modified afterwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +75 to +76
case *machineapi.SequenceEvent:
return handler.handleSequenceEvent(ctx, event, machineID)
Copy link
Member Author

@utkuozdemir utkuozdemir Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this (this case + the handleSequenceEvent function), otherwise no changes.

go.mod Outdated
@@ -3,6 +3,9 @@ module github.com/siderolabs/omni
go 1.23.3

replace (
// todo: remove when merged
github.com/siderolabs/omni-infra-provider-bare-metal v0.0.0-20241014092637-c3e892b87244 => github.com/utkuozdemir/sidero-omni-infra-provider-bare-metal v0.0.0-20241021213554-cc93e594a0d2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it okay that we create an import cycle, as the provider imports omni/client at the very least?

I know it's going to cause lots of fun when you do incompatible changes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, and actually it was unused. The initial idea and the reason was to import the GRPC server API types that the provider serves, so that Omni could send commands to the machines (e.g. Reboot) over the reverse GRPC tunnel, Omni->Provider->Machine.

I initially did that, but then removed to keep the scope smaller in this PR.

So I now removed this import, and also decided to remove the whole reverse tunnel logic. I'll keep that in a separate Draft PR for now, as I'm having second thoughts on whether it is necessary at all. It would only be used for reboot for the time being, and imo it is perfectly doable over the resources at hand as well (similar to wipeID, something like rebootID).

So I'll discuss this with @Unix4ever when he is back :)

Copy link
Member

@smira smira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say that I understand all of changes, but I don't see anything that looks bad

Introduce the concept of "static" infra providers, e.g., bare-metal infra provider, which manage a static set of machines contrary to the "regular" infra providers.

Add the following resources:

- `infra.Machine`: similar to `MachineRequest`, lives in the `infra-provider` namespace, serving as the input of the owning static provider. It is created in the `MachineController` if there is a SideroLink connection with the static provider ID. Regular flow of `Machine` creation is blocked, until this `infra.Machine` is accepted.

- `infra.MachineStatus`: similar to `MachineRequestStatus`, lives in the `infra-provider` namespace, serving as the output of the owning static provider. Its lifecycle must be bound to the corresponding `infra.Machine`.

- `infra.MachineState`: a resource that is supposed to be shared by Omni and bare-metal provider bi-directionally - they both can read from and write to it. It is currently used to mark the machine as installed when we observe an installation (through `SequenceEvent`s in the event sink), and to mark it as non-installed after we wipe it in the provider.

- `omni.InfraMachineConfig`: a user-managed resource to mark the `infra.Machine`s as accepted or set their desired power state. The acceptance information is then propagated to the `infra.Machine` resource. A machine which was already accepted cannot be unaccepted (checked by a validation), and this resource can only be removed when the `siderolink.Link` for the matching infra machine is removed.

Signed-off-by: Utku Ozdemir <[email protected]>
@utkuozdemir
Copy link
Member Author

/m

@talos-bot talos-bot merged commit 5a26d4c into siderolabs:main Nov 26, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants