Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The MachinePool remains in the "ScalingUp" phase instead of transitioning to "Running" #10541

Open
nitinthe0072000 opened this issue Apr 30, 2024 · 6 comments
Assignees
Labels
area/machinepool Issues or PRs related to machinepools kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@nitinthe0072000
Copy link

nitinthe0072000 commented Apr 30, 2024

What steps did you take and what happened?

We attempted to create a Kubernetes cluster on AWS using kubeadm as the bootstrap provider and AWS CAPA as the infrastructure provider. The Control Plane deployed successfully on EC2. However, we encountered an issue with the MachinePool (Worker Nodes) where it remains in the "ScalingUp" phase instead of transitioning to "Running" even after the node joins the control plane and is in the "Ready" state."

Also there is one more issue that the worker node attached with control plane though comes to ready state but the node have the following taint attached always node.cluster.x-k8s.io/uninitialized:NoSchedule

NAME                                 DATA   AGE
configmap/aws-vpc-cni-driver-addon   1      29m

NAME                                       CLUSTERCLASS   PHASE         AGE   VERSION
cluster.cluster.x-k8s.io/kubeadm-cluster                  Provisioned   29m   

NAME                                                         CLUSTER           READY   VPC                     BASTION IP
awscluster.infrastructure.cluster.x-k8s.io/kubeadm-cluster   kubeadm-cluster   true    vpc-01f450e7d16ae68c1   

NAME                                                                              CLUSTER           INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/kubeadm-cluster-control-plane   kubeadm-cluster   true          true                   1          1       1         0             29m   v1.28.7

NAME                                                                               AGE
awsmachinetemplate.infrastructure.cluster.x-k8s.io/kubeadm-cluster-control-plane   29m

NAME                                                CLUSTER           REPLICAS   PHASE       AGE   VERSION
machinepool.cluster.x-k8s.io/kubeadm-cluster-mp-0   kubeadm-cluster   1          ScalingUp   29m   v1.28.7

NAME                                                                  READY   REPLICAS   MINSIZE   MAXSIZE   LAUNCHTEMPLATE ID
awsmachinepool.infrastructure.cluster.x-k8s.io/kubeadm-cluster-mp-0   true    1          1         10        lt-0228e7376fb7fd6d5

NAME                                                            CLUSTER           AGE
kubeadmconfig.bootstrap.cluster.x-k8s.io/kubeadm-cluster-mp-0   kubeadm-cluster   29m

NAME                                                 AGE
clusterresourceset.addons.cluster.x-k8s.io/crs-cni   29m`

From the logs of CAPI Controller we got the following:

I0430 09:52:23.610332       1 machinepool_controller_noderef.go:168] "No ProviderID detected, skipping" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="default/kubeadm-cluster-mp-0" namespace="default" name="kubeadm-cluster-mp-0" reconcileID="cff1204c-14f7-494b-b75b-4350252102f0" Cluster="default/kubeadm-cluster" providerIDList=1 providerID=""
I0430 09:52:23.610393       1 machinepool_controller_noderef.go:168] "No ProviderID detected, skipping" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="default/kubeadm-cluster-mp-0" namespace="default" name="kubeadm-cluster-mp-0" reconcileID="cff1204c-14f7-494b-b75b-4350252102f0" Cluster="default/kubeadm-cluster" providerIDList=1 providerID=""
I0430 09:52:23.610414       1 machinepool_controller_noderef.go:87] "Cannot assign NodeRefs to MachinePool, no matching Nodes" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="default/kubeadm-cluster-mp-0" namespace="default" name="kubeadm-cluster-mp-0" reconcileID="cff1204c-14f7-494b-b75b-4350252102f0" Cluster="default/kubeadm-cluster"

The describe of MachinePool:

Name:         kubeadm-cluster-mp-0
Namespace:    default
Labels:       cluster.x-k8s.io/cluster-name=kubeadm-cluster
              nodepool=nodepool-0
Annotations:  <none>
API Version:  cluster.x-k8s.io/v1beta1
Kind:         MachinePool
Metadata:
  Creation Timestamp:  2024-04-30T09:44:52Z
  Finalizers:
    machinepool.cluster.x-k8s.io
  Generation:  3
  Owner References:
    API Version:     cluster.x-k8s.io/v1beta1
    Kind:            Cluster
    Name:            kubeadm-cluster
    UID:             12d5fe0d-55c4-454e-ae17-1f4c790ffdd1
  Resource Version:  1398339
  UID:               b7662d30-c459-4a7c-a859-a8b96e0c3bfa
Spec:
  Cluster Name:       kubeadm-cluster
  Min Ready Seconds:  0
  Provider ID List:
    aws:///eu-west-3a/i-03fb955e66b456xxx
  Replicas:  1
  Template:
    Metadata:
      Labels:
        Nodepool:  nodepool-0
    Spec:
      Bootstrap:
        Config Ref:
          API Version:     bootstrap.cluster.x-k8s.io/v1beta1
          Kind:            KubeadmConfig
          Name:            kubeadm-cluster-mp-0
          Namespace:       default
        Data Secret Name:  kubeadm-cluster-mp-0
      Cluster Name:        kubeadm-cluster
      Infrastructure Ref:
        API Version:  infrastructure.cluster.x-k8s.io/v1beta2
        Kind:         AWSMachinePool
        Name:         kubeadm-cluster-mp-0
        Namespace:    default
      Version:        v1.28.7
Status:
  Bootstrap Ready:  true
  Conditions:
    Last Transition Time:  2024-04-30T09:50:48Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2024-04-30T09:50:46Z
    Status:                True
    Type:                  BootstrapReady
    Last Transition Time:  2024-04-30T09:50:48Z
    Status:                True
    Type:                  InfrastructureReady
    Last Transition Time:  2024-04-30T09:44:52Z
    Status:                True
    Type:                  ReplicasReady
  Infrastructure Ready:    true
  Observed Generation:     3
  Phase:                   ScalingUp
  Replicas:                1
  Unavailable Replicas:    1
Events:                    <none>

CAPI File:

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: kubeadm-cluster
  labels:
    cni: external
spec:
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSCluster
    name: kubeadm-cluster
  controlPlaneRef:
    kind: KubeadmControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: kubeadm-cluster-control-plane
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
metadata:
  name: kubeadm-cluster
spec:
  region: eu-west-3
  sshKeyName: capi-server
---
kind: KubeadmControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
metadata:
  name: kubeadm-cluster-control-plane
spec:
  replicas: 1
  version: 1.28.7
  machineTemplate:
    infrastructureRef:
      kind: AWSMachineTemplate
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
      name: kubeadm-cluster-control-plane
  kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
        name: '{{ ds.meta_data.local_hostname }}'
    clusterConfiguration:
      apiServer:
        extraArgs:
          authorization-mode: Node,RBAC
      etcd:
        local:
          dataDir: /var/lib/etcd
      kubernetesVersion: 1.28.7
      networking:
        dnsDomain: cluster.local
        serviceSubnet: 10.96.0.0/12
    joinConfiguration:
      nodeRegistration:
        name: '{{ ds.meta_data.local_hostname }}'
---
kind: AWSMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
metadata:
  name: kubeadm-cluster-control-plane
spec:
  template:
    spec:
      instanceType: t3a.large
      iamInstanceProfile: "control-plane.cluster-api-provider-aws.sigs.k8s.io"
      sshKeyName: capi-server
      ami:
       id: ami-05a64c4151d99b765
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
  name: kubeadm-cluster-mp-0
  namespace: default
  labels:
    nodepool: nodepool-0  
spec:
  clusterName: kubeadm-cluster
  replicas: 1
  template:
    metadata:
      labels:
        nodepool: nodepool-0
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfig
          name: kubeadm-cluster-mp-0
      clusterName: kubeadm-cluster
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
        kind: AWSMachinePool
        name: kubeadm-cluster-mp-0
      version: 1.28.7
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachinePool
metadata:
  labels:
    nodepool: nodepool-0
  name: kubeadm-cluster-mp-0
  namespace: default
spec:
  minSize: 1
  maxSize: 10
  availabilityZones:
    - eu-west-3a
  awsLaunchTemplate:
    iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
    instanceType: t3a.large
    sshKeyName: capi-server
    ami:
      id: ami-05a64c4151d99b765
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
metadata:
  name: kubeadm-cluster-mp-0
  namespace: default
spec:
  joinConfiguration:
    nodeRegistration:
      name: '{{ ds.meta_data.local_hostname }}'
---
apiVersion: addons.cluster.x-k8s.io/v1beta1
kind: ClusterResourceSet
metadata:
  name: crs-cni
spec:
  clusterSelector:
    matchLabels:
      cni: external
  resources:
  - kind: ConfigMap
    name: aws-vpc-cni-driver-addon
  strategy: ApplyOnce

What did you expect to happen?

MachinePool Phase should be in running state once the worker node becomes ready. Also the following taint node.cluster.x-k8s.io/uninitialized:NoSchedule should be removed from the worker node.

Cluster API version

CAPI Version : 1.7.1
CAPA Version : 2.4.2

Kubernetes version

Kubernetes : v1.28.7

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 30, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

CAPI contributors will take a look as soon as possible, apply one of the triage/* labels and provide further guidance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 30, 2024
@fabriziopandini
Copy link
Member

cc @willie-yao @Jont828 @mboersma to take a first look and assign priority

@mboersma
Copy link
Contributor

mboersma commented May 2, 2024

/priority important-soon

This could be a bug in CAPI MachinePools, but we need to verify that it's not specific to AWS and try to find a fix either way.

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 2, 2024
@mboersma
Copy link
Contributor

mboersma commented May 2, 2024

/assign

@sbueringer sbueringer added the area/machinepool Issues or PRs related to machinepools label Jul 10, 2024
@sbueringer
Copy link
Member

Is this the same as: #9858?
(which seemed to be an issue in CAPA)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2024
@fabriziopandini fabriziopandini added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/machinepool Issues or PRs related to machinepools kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

6 participants