Skip to content

Commit

Permalink
Add the proposal of the new feature - swich hub
Browse files Browse the repository at this point in the history
Signed-off-by: xuezhaojun <[email protected]>
  • Loading branch information
xuezhaojun committed Oct 27, 2023
1 parent 40f519c commit 96b43b4
Show file tree
Hide file tree
Showing 3 changed files with 132 additions and 0 deletions.
120 changes: 120 additions & 0 deletions enhancements/sig-architecture/100-relocatable-klusterlet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Switch hub

## Release Signoff Checklist

- [] Enhancement is `implemented`
- [] Design details are appropriately documented from clear requirements
- [] Test plan is defined
- [] Graduation criteria for dev preview, tech preview, GA
- [] User-facing documentation is created in [website](https://github.com/open-cluster-management-io/open-cluster-management-io.github.io/)

## Summary
This proposal aims to provide a mechanism to enable agents to smoothly switch hubs at scale with fewer manual operations.

## Motivation
Currently, an agent can only point to one hub, requiring manual configuration and operations to switch agents to another hub.

To address this, we will provide a simple mechanism for agents to determine when they should switch to another hub. This mechanism will enable agents to complete the switching process, eliminating the need for manual operations.

This mechanism will be beneficial for various scenarios, including upgrading, disaster recovery, improving availability, etc.

## Goals
- To provide an interface for the hubs of an agent to designate one of them as the **leader hub** for the agent to connect to at any given time.
- To implement the automatic switch hub process of agents.

## Non-Goals
- To ensure bootstrapkubeconfig candidates valid **during the candidate selection phase** is out of the scope.
- To determine the leader hub at any given time, consider your specific scenarios.
- For instance, in the case of backup and restore, let's say we want to back up from hub1 and restore to hub2. During the backup and restore process, hub1 should remain the leader hub. However, once the restore process is complete, hub2 will become the leader hub.
- To synchronize customer workloads and configurations between hubs, ensure that the resources of the hubs are synchronized.
- If the resources are not synchronized, the old customer workload may be wiped out after the switch.

## Use cases

### Story1 - Backup and restore
As a user, I want to switch agents to the new(restored) hub **without manual operations**.

### Story2 - Rolling upgrade
As a user, I want to switch half of the agents to the upgraded hub to test the functionality of the upgraded components.

### Story3 - High availability architecture
As a user, I want to configure 2 hubs in active-active mode to improve management availability. In this setup, if one hub goes down, the other hub will continue to provide management capabilities.

## Risk and Mitigation
N/A

## Design Details

### Terms

#### Bootstrapkubeconfig candidates
A set of secret contains the bootstrapkubeconfig of each hub. A bootstrapkubeconfig candidate is labeled with `cluster.open-cluster-management.io/bootstrap-kubeconfig-candidate`.

#### Bootstrapkubeconfig Leader

We add a new annotation `cluster.open-cluster-management.io/bootstrap-kubeconfig-leader` on the managed cluster CR. The value of this annotation represent the secret name of one of bootstrapkubeconfig candidates on a agent.

Once the agent register on a hub, it can get the candidate it should use at that point of time. If the leader is not equal to the current in used candidate, the agent will switch to the leader.


### Workflow

Let’s seen an example, assume we have 2 hubs: `hub1` and `hub2`, and a managed cluster named `cluster1`. We want the leader hub to be `hub1`.

First, we create a managedcluster CR with a specific annotation `cluster.open-cluster-management.io/bootstrap-kubeconfig-leader` on each hub, the value is the name of the bootstrapkubeconfig candidate of `hub1`:

```yaml
apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
name: cluster1
annotations:
cluster.open-cluster-management.io/bootstrap-kubeconfig-leader: "bootstrap-hub-kubeconfig-hub1"
```
On hub2, the value of the label on hub2 is `false`, indicating that it is not chosen by cluster1.
```yaml
apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
name: cluster1
annotations:
cluster.open-cluster-management.io/bootstrap-kubeconfig-leader: "bootstrap-hub-kubeconfig-hub1"
```

Then, we store the bootstrap kubeconfig of 2 hubs as the secrets on the agent `open-cluster-management-agent` namespace and follow the regular registration process.

```yaml
kind: Secret
apiVersion: v1
metadata:
name: bootstrap-hub-kubeconfig-hub1
namespace: open-cluster-management-agent
labels:
cluster.open-cluster-management.io/bootstrap-kubeconfig-candidate: ""
data:
kubeconfig: YXBpVmVyc2lvbjogdjEK...
```

```yaml
kind: Secret
apiVersion: v1
metadata:
name: bootstrap-hub-kubeconfig-hub2
namespace: open-cluster-management-agent
labels:
cluster.open-cluster-management.io/bootstrap-kubeconfig-candidate: ""
data:
kubeconfig: YXBpVmVyc2lvbjogdjEK...
```

At the start of the registration process, the klusterlet will randomly pick up a bootstrap kubeconfig candidate and use it to register to the hub.

After the registration is complete, the klusterlet will get the `cluster.open-cluster-management.io/bootstrap-kubeconfig-leader` annotation on the managed cluster CR. If the value of the annotation is the same as the secret name of the currently in used bootstrap kubeconfig candidate, the klusterlet will do nothing. Otherwise, the klusterlet will use the secret specified by the annotation to register to another hub.

The following diagram shows the complete status transition of the klusterlet:
[![status transition diagram](./status-transition-diagram.png)](./status-transition-diagram.png)

The `leader change` is triggered by 2 cases:
1. The value `leader` annotition on hub changes.
2. The content of current in used bootstrapkubeconfig candidate changes.
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
title: relocatable klusterlet
authors:
- "@xuezhaojun"
reviewers:
- "@qiujian16"
- "@deads2k"
approvers:
- "@qiujian16"
- "@deads2k"
creation-date: 2023-10-30
last-updated: 2023-10-30
status: provisional
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 96b43b4

Please sign in to comment.