Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bare Metal Infrastructure Provider Phase 1 #660

Open
smira opened this issue Oct 2, 2024 · 0 comments
Open

Bare Metal Infrastructure Provider Phase 1 #660

smira opened this issue Oct 2, 2024 · 0 comments
Assignees

Comments

@smira
Copy link
Member

smira commented Oct 2, 2024

Prerequisites

Support only Talos >= 1.8.0 ✅ (1.9.0)

Phase 0.1

Implement metal agent as a mini-Talos with a fat base image:

  • include all firmware extensions and anything hardware-related, so we can always boot & detect hardware ✅
  • figure out a way either to copy or (better!) share some resources and controllers between an agent and Talos ✅ (infra provider does it already)
  • SideroLink ✅
  • hardware info, including disks and network links ✅
  • maintenance apid running only over SideroLink ✅

Idea: add a build tag to Talos to remove all stuff which is not required for an agent, and build Talos initramfs image with that tag. Layer on top of that an agent, for example, as an extension service. ✅ (done as a mode)

Boot all (test) QEMU VMs via PXE, run a minimal agent (reports back to the infra provider and to Omni). If the machine is allocated, PXE boot it from the Image Factory with proper schematic & Talos version. ✅

  • PXE server ✅
  • Proxy DHCP (optional) ✅
  • Metadata server (returning initial talos.config= contents) ✅
  • Provider API for the agent to report back to (borrow ideas/implementation from Sidero Metal) ✅
  • Inject proper labels/join token to associate the machine with the provider ✅ (❓: which labels exactly, other than labels set as provider command line args)

Flow:

  • initially provider doesn't know about any machines ✅
  • a machine comes to the PXE endpoint, it's unknown, so provider boots an agent ✅
  • agent reports back to the provider: "there's a machine UUID x" ✅
  • now provider knows about the machine, and next PXE attempt will boot Talos ✅

Phase 0.2

Power management:

  • agent should discover/provision and report IPMI credentials ✅
  • provider should reconcile power state of the machine based on what Omni says ⌛ (actual power on/off not yet implemented)
  • in QEMU/test environment we have a mock power API we can leverage for testing ✅ (use QEMU power API of cluster create API)

Omni:

  • default power state (based on allocated/not-allocated status) ⌛ (no resource to override global config yet)
  • user-provided overrides for power state/UI & API to manage power state logic ✅ , UI ⌛

Phase 0.3

Acceptance (configurable, with an option to auto-accept) flow - the machine appears in the Machines view, but no actions are performed on the machine (e.g. it is not wiped, it can't be added to a cluster, don't do IPMI setup, power management, etc.). logic ✅, UI ⌛

Omni provides some UI to accept machines, show not accepted machines, etc. ✅

Provider knows about the acceptance status - if machine is accepted, provider can start some additional actions (in the next phases). ✅

If the machine is not accepted, the agent should "hang" until it either receives the signal that it got accepted, or rejected. It provisions IPMI creds only once the machine is accepted. ✅

Phase 0.4

Hardware reboot support. ⌛ (reboot API is there on agent and on power mgmt API, but not yet implemented on the provider. need to decide on the best way)

Phase 0.5

Disk wipe - initial after acceptance, and disk wipe after the machine is removed from the cluster. ✅

Omni: change the "reset" flow in Omni to use the provider's wipe capability: machine is force-rebooted over IPMI (or equivalent), and forced to PXE boot, and agent is booted up to wipe the disks, and machine is once again available. ✅ (note: we do 2-step reset: first Omni reset, then agent reset, i.e., wipe)

Phase 0.6

Redfish support. ⌛ not implemented, hardware is ready to start implementing/testing it
❓ (

  • if machine also supports IPMI, what's the point of redfish?
  • if machine says I support both, which one provider uses (maybe a flag to the provider for order of preference?)
    )

Phase 0.7

Provider-specific configurable labels for the joining machines (e.g. dc=nyc). ⌛
(
The CLI args on the provider to add additional labels is there.
They are not reconciled atm - existing machines do not get updated to get them. Is that ok?
User configuration / overrides option not provided yet. Should we?
)

Phase 0.8

Discovering hardware in the agent (e.g. bnx2 NIC) and automatically building initial set of system extensions to use: e.g. bnx2-firmware. 🔴 not done

Phase 0.9

Support for kexec when transitioning from the agent to Talos. 🔴 not done

Example:

  • machine is discovered, agent boots up ✅
  • machine is accepted, agent wipes the disks, provisions IPMI creds, but has some timeout before it gets powered off/rebooted 🔴 not done
  • if the machine is allocated within that timeout, instead of full reboot, download next Talos kernel args, initramfs, kernel image and kexec into it 🔴 not done
@smira smira changed the title Bare Metal Infrastructure Provide Phase 1 Bare Metal Infrastructure Provider Phase 1 Oct 2, 2024
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 14, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 14, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 18, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 21, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 21, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 22, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 23, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 25, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 25, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 25, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Oct 29, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 9, 2024
Add initial implementation of the Talos agent mode service.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 11, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 27, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 28, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 29, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 29, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 29, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 29, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
utkuozdemir added a commit to utkuozdemir/sidero-omni-infra-provider-bare-metal that referenced this issue Nov 29, 2024
Add initial implementation of the bare-metal infra provider.

Related to siderolabs/omni#660.

Signed-off-by: Utku Ozdemir <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants