CAPI is taking too long removing taint node.cluster.x-k8s.io/uninitialized:NoSchedule from nodes #9858

njuettner · 2023-12-12T07:43:25Z

What steps did you take and what happened?

From reading the docs bootstrap provider can optionally taint worker nodes at creation with node.cluster.x-k8s.io/uninitialized:NoSchedule. We noticed that CAPI is taking up to 5 minutes removing the taint from nodes.

Unfortunately this is marked as optional but there is no flag to omit it. From observing the code
we cannot see any place where this is possible.

We don't think it is intended, especially when managing MachinePools externally by cluster-autoscaler.

Additionally there are other provider specific controller like aws-cloud-controller-manager who sets a similar taint node.cloudprovider.kubernetes.io/uninitialized waiting for nodes becoming ready.

What did you expect to happen?

Ensure taint configuration in bootstrap provider is optional.

Cluster API version

1.4.7

Kubernetes version

No response

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug
/area bootstrap

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2023-12-12T07:43:33Z

This issue is currently awaiting triage.

If CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

AverageMarcus · 2023-12-12T08:07:09Z

In addition to this taking time, it also creates a requirement for the MC to be able to connect to the WC when new nodes are added. This means if you're using externally managed cluster-autoscaler (as in, running cluster-autoscaler in the workload cluster) and the MC is unavailable for any reason (network issue, upgrading, maintenance, etc) new nodes added by cluster-autoscaler will not become ready until the MC can connect again. This breaks one of the main reasons for wanting to run cluster-autoscaler in the WC in the first place - not having a reliance on the MC for scaling when needed.

chrischdi · 2023-12-12T08:27:39Z

Question: does it only take too long for MachinePool based Machines which got created due to getting externally scaled?

Some datapoints around the removal of the taint:

It gets done at the patchNode function:

cluster-api/internal/controllers/machine/machine_controller_noderef.go

Line 293 in cc97900

    
           hasTaintChanges := taints.RemoveNodeTaint(newNode, clusterv1.NodeUninitializedTaint)

Which gets called from reconcileNode

cluster-api/internal/controllers/machine/machine_controller_noderef.go

Line 136 in cc97900

    
           if err := r.patchNode(ctx, remoteClient, node, nodeLabels, nodeAnnotations); err != nil {

And before that it waits e.g. for having a ProviderID set at the Machine as well as at the node inside the workload cluster:

cluster-api/internal/controllers/machine/machine_controller_noderef.go

Lines 57 to 61 in cc97900

    
           if machine.Spec.ProviderID == nil || *machine.Spec.ProviderID == "" { 
        
           	log.Info("Waiting for infrastructure provider to report spec.providerID", machine.Spec.InfrastructureRef.Kind, klog.KRef(machine.Spec.InfrastructureRef.Namespace, machine.Spec.InfrastructureRef.Name)) 
        
           	conditions.MarkFalse(machine, clusterv1.MachineNodeHealthyCondition, clusterv1.WaitingForNodeRefReason, clusterv1.ConditionSeverityInfo, "") 
        
           	return ctrl.Result{}, nil 
        
           }

cluster-api/internal/controllers/machine/machine_controller_noderef.go

Lines 69 to 85 in cc97900

    
           node, err := r.getNode(ctx, remoteClient, *machine.Spec.ProviderID) 
        
           if err != nil { 
        
           	if err == ErrNodeNotFound { 
        
           		// While a NodeRef is set in the status, failing to get that node means the node is deleted. 
        
           		// If Status.NodeRef is not set before, node still can be in the provisioning state. 
        
           		if machine.Status.NodeRef != nil { 
        
           			conditions.MarkFalse(machine, clusterv1.MachineNodeHealthyCondition, clusterv1.NodeNotFoundReason, clusterv1.ConditionSeverityError, "") 
        
           			return ctrl.Result{}, errors.Wrapf(err, "no matching Node for Machine %q in namespace %q", machine.Name, machine.Namespace) 
        
           		} 
        
           		conditions.MarkFalse(machine, clusterv1.MachineNodeHealthyCondition, clusterv1.NodeProvisioningReason, clusterv1.ConditionSeverityWarning, "") 
        
           		// No need to requeue here. Nodes emit an event that triggers reconciliation. 
        
           		return ctrl.Result{}, nil 
        
           	} 
        
           	log.Error(err, "Failed to retrieve Node by ProviderID") 
        
           	r.recorder.Event(machine, corev1.EventTypeWarning, "Failed to retrieve Node by ProviderID", err.Error()) 
        
           	return ctrl.Result{}, err 
        
           }

Would be good to first know the reason why it takes too long.

It could be that if it is scaled externally, that it takes too long for CAPI to reconcile and finish the node? I think we watch the node objects in a cluster 🤔

Do you have datapoints about if other work for the node finished? E.g. did a provider ID got set by the cloud provider running inside this cluster (as noted above)?

In addition to this taking time, it also creates a requirement for the MC to be able to connect to the WC when new nodes are added. This means if you're using externally managed cluster-autoscaler (as in, running cluster-autoscaler in the workload cluster) and the MC is unavailable for any reason (network issue, upgrading, maintenance, etc) new nodes added by cluster-autoscaler will not become ready until the MC can connect again. This breaks one of the main reasons for wanting to run cluster-autoscaler in the WC in the first place - not having a reliance on the MC for scaling when needed.

Regarding communication to the WC: there is a feature group around alternative communication patterns (don't know how active it is currently): https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/community/20230111-alternative-communication-patterns.md

chrischdi · 2023-12-12T08:28:03Z

Also there is a feature group around for karpenter: https://hackmd.io/@elmiko/ryR2VXR0n

fabriziopandini · 2024-01-19T10:21:46Z

/triage needs-information

fabriziopandini · 2024-04-11T17:59:31Z

/close
due to lack of feedback

k8s-ci-robot · 2024-04-11T17:59:35Z

@fabriziopandini: Closing this issue.

In response to this:

/close
due to lack of feedback

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mweibel · 2024-06-26T15:06:13Z

I'm not sure if it's better to reopen this or create a new issue but we noticed similar issues on our CAPI (v1.7.3) and CAPZ (v1.5.2 with a few patches in open PRs) and cluster-autoscaler with clusterapi provider (v1.28.2 with a few patches in open PRs).

We're observing this using MachinePools.

Looking at metrics using the following prometheus query (replace scrape_interval with your scrape interval):

sum by (node) (
  sum_over_time(kube_node_spec_taint{key="node.cluster.x-k8s.io/uninitialized"}[$__range]) * scrape_interval
)

most have the taint for 30-60s. However, looking at the top50 by wrapping the query in topk(50, ...), we see some outliers - maximum in our case was 3090s and minimum of those top 50 was at 210s which is 3.5mins. This is IMO too long for "only" a label sync.

I've been investigating a bit the code and some logs as to why that might happen in some cases. I've not fully verified this yet, but I hope anyone could confirm/reject my suspicion :)

the uninitialized taint is used for label syncing from Machine to Node
that taint is removed once sync is done
Reconcile of a MachinePool reconciles the bootstrap data, then infrastructure, then nodeRefs
reconcileNodeRefs is where the label sync is done but needs the ProviderIDList for that

Therefore: a prerequisite for label sync are the providerIDs which get populated by reconcileInfrastructure.

ProviderIDList is synced only if the MachinePool is marked ready which is steered by InfrastructureReady

Looking at the CAPZ code related to readyness:
The ready state is only set when the underlying VMSS is in state succeeded and the replicas match.

This means that when we're scaling up or down, the InfraMP is not marked ready:

status:
  # ....
  infrastructureMachineKind: AzureMachinePoolMachine
  provisioningState: Updating
  ready: false
  replicas: 0

and in this case the nodes stay with uninitialized until the InfraMP is ready again.

Now my questions are:

is this analysis correct?
Is CAPZ handling the ready state correctly?
do we really need to have a InfraMP ready state for updating the providerIdList or could we do it anyway so the node taint gets removed also during ScaleUp or ScaleDown?

k8s-ci-robot · 2024-06-26T18:17:02Z

@mweibel: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

killianmuldoon · 2024-06-27T07:36:55Z

@mweibel reopening

k8s-ci-robot · 2024-06-27T07:37:02Z

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

chrischdi · 2024-06-27T08:46:16Z

While it sounds reasonable to check and see if this can be improved, have you also been able to find out why it took for that long for the underlying VMSS is to be in state succeeded and the replicas match?

mweibel · 2024-06-27T09:01:50Z

If I understand the CAPZ code correctly, it only adds ready MachinePoolMachines to the count in AzureMachinePool.Status.Replicas. Given that code:

It depends on OS, and availabilty of the machines we need and on the amount of machines. We run batch workloads and sometimes scale up from close to zero (or zero) to 100 machines within a short amount of time.

If we, let's say, scale up from 0 to 10, and a machine takes 1-5 minutes to be ready (linux machines take <1min usually, while windows machines average on 5min):
If all 10 machines become ready in the same amount of time, it takes worst case 5mins for the AzureMachinePool to be marked as ready again. It'll be faster on a Linux MachinePool, and can take longer on Windows MachinePools.

The worst case above with the 3090s might have been some other issue - I've investigated a bit but will need to investigate some more to figure it out.

Does that additional context help to make the issue more clear?

chrischdi · 2024-06-27T09:30:37Z

Makes sense, I think its still worth to find the root cause though.

Changing the above mentioned behaviour in CAPI would lead to a change of the contract documented here: https://main.cluster-api.sigs.k8s.io/developer/architecture/controllers/machine-pool

Especially this part:

The status object must have at least one field defined:

ready - a boolean field indicating if the infrastructure is ready to be used or not.

[0]

If we want to change how this works, we need to also take a look if the other providers supporting MachinePools are okay with it to not break things.

So getting to the questions:

Is CAPZ handling the ready state correctly?

I think CAPZ is doing fine according to the contract.

Depending on in which state CAPZ's MachinePool is, CAPZ could consider to earlier set the ready boolean to true.
If I did see it right, CAPI still checks if all nodes are already there.

do we really need to have a InfraMP ready state for updating the providerIdList or could we do it anyway so the node taint gets removed also during ScaleUp or ScaleDown?

I think this is defined in the contract so yes, CAPI should not continue unless the ready state is set to true, however, its the Infra Provider's decision when it signals to CAPI that it could continue.

Note: I'm not that deep into the machine pool controller and machine pools, so don't consider this as strong opinion. But changing the contract IMHO should be done very very carefully. Also see the machine pool proposal which has more detailed diagrams about it.

sbueringer · 2024-07-02T10:04:03Z

cc @mboersma

sbueringer · 2024-07-03T05:19:18Z

In addition to this taking time, it also creates a requirement for the MC to be able to connect to the WC when new nodes are added. This means if you're using externally managed cluster-autoscaler (as in, running cluster-autoscaler in the workload cluster) and the MC is unavailable for any reason (network issue, upgrading, maintenance, etc) new nodes added by cluster-autoscaler will not become ready until the MC can connect again. This breaks one of the main reasons for wanting to run cluster-autoscaler in the WC in the first place - not having a reliance on the MC for scaling when needed.

In my opinion with cabpk this requirement exists independent of the taint. When a kubeadmconfig is created cabpk creates a bootstrap token secret within the wl cluster. It's unclear to me how this would work without access to the wl cluster.

sbueringer · 2024-07-03T05:21:28Z

@AverageMarcus can you please share how you were able to create new nodes without connectivity with cluster autoscaler and cabpk before we introduced the taint

I'm also curious to know how the cluster-autoscaler is able to scale up a MD in the mgmt cluster if the mgmt cluster is not available (or are we assuming the connection is only broken in the other direction in that scenario?)

sbueringer · 2024-07-11T09:17:22Z

Shouldn't at some point the Nodes show up in the workload cluster and then that should trigger a reconcile of the Machine controller? If it doesn't, then that should be implement, otherwise it should be fixed if it's buggy

See:

cluster-api/internal/controllers/machine/machine_controller.go

Line 861 in 5b6043e

    
           func (r *Reconciler) watchClusterNodes(ctx context.Context, cluster *clusterv1.Cluster) error {

EDIT: Sorry I guess Christian was saying roughly the same already above

paurosello · 2024-07-11T10:00:20Z

So cluster-autoscaler triggers the scaling via the AWS autoscaling group and never touches CAPI objects. So you are not using cluster-autoscaler in --cloud-provider=clusterapi mode as described in the docs, correct?

Sidenote: In this case, the management cluster still needs connectivity to the workload cluster to create or refresh the bootstrap token used.

Regarding "taking too long to remove the taint":

When cluster-autoscaler does directly talk to AWS, CAPI maybe does not get an event which triggers a reconciliation.

Propably one of the reasons why cluster-autoscaler has a special mode for clusterapi.

Because of this it may take the controller to hit the configured sync period to get an event and reconcile the MachinePool to remove the taint from the node.

Yes, the bootstrap token would still be a blocker if the Management Cluster is down for more than 15 minutes, which I think is too short of a period but that would be for another issue :)

And yeah, I guess the issue is that reconciliation loop can take a bit of time.

Would it make sense to only have this taint on the instances created by Machine CRs but not on the instances created via MachinePools?

sbueringer · 2024-07-11T10:39:07Z

I would like to figure out first why the current implementation that should react immediately doesn't work (#9858 (comment))

There's also this dependency:

I'm also curious to know how the cluster-autoscaler is able to scale up a MD in the mgmt cluster if the mgmt cluster is not available (or are we assuming the connection is only broken in the other direction in that scenario?)
#9858 (comment)

paurosello · 2024-07-11T13:18:01Z

I would like to figure out first why the current implementation that should react immediately doesn't work (#9858 (comment))

There's also this dependency:

I'm also curious to know how the cluster-autoscaler is able to scale up a MD in the mgmt cluster if the mgmt cluster is not available (or are we assuming the connection is only broken in the other direction in that scenario?)
#9858 (comment)

The current situation is it works, but there is a very big delay and my main grip is I don't see why the taint is needed at all as the instances are fully operational.

Have no experience on MachineDeployments, we only use MachinePools

mweibel · 2024-07-12T10:49:22Z

FYI I linked a CAPZ issue here for a related issue. I continue to investigate this.

@chrischdi FYI:

So cluster-autoscaler triggers the scaling via the AWS autoscaling group and never touches CAPI objects. So you are not using cluster-autoscaler in --cloud-provider=clusterapi mode as described in the docs, correct?

In our case we use the cluster-autoscaler in clusterapi provider mode.

@paurosello

The current situation is it works, but there is a very big delay and my main grip is I don't see why the taint is needed at all as the instances are fully operational.

I have the same feeling, but I also see why it needs to be there. We could probably avoid the taint if the MachinePool template doesn't specify labels - but I don't think this would be a good fix because it hides an underlying issue.

paurosello · 2024-07-15T07:33:26Z

FYI I linked a CAPZ issue here for a related issue. I continue to investigate this.

@chrischdi FYI:

So cluster-autoscaler triggers the scaling via the AWS autoscaling group and never touches CAPI objects. So you are not using cluster-autoscaler in --cloud-provider=clusterapi mode as described in the docs, correct?

In our case we use the cluster-autoscaler in clusterapi provider mode.

@paurosello

The current situation is it works, but there is a very big delay and my main grip is I don't see why the taint is needed at all as the instances are fully operational.

I have the same feeling, but I also see why it needs to be there. We could probably avoid the taint if the MachinePool template doesn't specify labels - but I don't think this would be a good fix because it hides an underlying issue.

Yeah, I can see why it's useful but wonder if we could either make it optional or decrease the reconciliation loop time to make it much faster to detect new nodes.

sbueringer · 2024-07-15T09:21:24Z

I would like to figure out first why the current implementation that should react immediately doesn't work (#9858 (comment))

Until then I'm -1 to fixes like disabling the feature or reducing the sync period. Just to be clear. With our current implementation we should immediately reconcile after a Node shows up in the workload cluster (this means within a few milliseconds the taint should be gone)

paurosello · 2024-07-15T09:52:48Z

I would like to figure out first why the current implementation that should react immediately doesn't work (#9858 (comment))

Until then I'm -1 to fixes like disabling the feature or reducing the sync period. Just to be clear. With our current implementation we should immediately reconcile after a Node shows up in the workload cluster (this means within a few milliseconds the taint should be gone)

How should it reconcile immediately when the nodes are created by cluster-autoscaler directly talking to the AWS API? Aren't we limited by the sync-period (10 minutes by default) when using cluster-autoscaler deployed in the workload cluster?

chrischdi · 2024-07-15T09:57:38Z

I would like to figure out first why the current implementation that should react immediately doesn't work (#9858 (comment))

Until then I'm -1 to fixes like disabling the feature or reducing the sync period. Just to be clear. With our current implementation we should immediately reconcile after a Node shows up in the workload cluster (this means within a few milliseconds the taint should be gone)

How should it reconcile immediately when the nodes are created by cluster-autoscaler directly talking to the AWS API? Aren't we limited by the sync-period (10 minutes by default) when using cluster-autoscaler deployed in the workload cluster?

As stefan pointed out: the controller should watch for the nodes in the cluster and get an event when the kubelet created its node: #9858 (comment)

paurosello · 2024-07-15T10:00:19Z

I would like to figure out first why the current implementation that should react immediately doesn't work (#9858 (comment))

Until then I'm -1 to fixes like disabling the feature or reducing the sync period. Just to be clear. With our current implementation we should immediately reconcile after a Node shows up in the workload cluster (this means within a few milliseconds the taint should be gone)

How should it reconcile immediately when the nodes are created by cluster-autoscaler directly talking to the AWS API? Aren't we limited by the sync-period (10 minutes by default) when using cluster-autoscaler deployed in the workload cluster?

As stefan pointed out: the controller should watch for the nodes in the cluster and get an event when the kubelet created its node: #9858 (comment)

But that is only executed every "sync-period" when there are no changes to the CRs AFAIK

sbueringer · 2024-07-15T10:06:34Z

A Kubernetes watch does not depend on the sync period (you can try with kubectl get -w). The apiserver pushes an event ~ as soon as it occurs

sync period is just the interval in which the local list => broadcast is executed for all objects currently in the local store

paurosello · 2024-07-15T10:09:15Z

Will try to gather logs and events from controllers, but I think this watch is not really triggering the removal of the taint.

sbueringer · 2024-07-15T10:10:40Z

That is exactly what I'm trying to say :). For some reason this watch does not seem to work, and the proper fix is to figure out why

paurosello · 2024-07-15T11:10:56Z

Here we can see it happening:

last node was provisioned at Mon, 15 Jul 2024 12:58:57 +0200

The SuccessfulSetNodeRefs event was triggered at Mon, 15 Jul 2024 13:06:00 +0200

sbueringer · 2024-07-15T11:15:03Z

These events are not the same as the events pushed by the Kubernetes watch API.

paurosello · 2024-07-15T11:24:55Z

This is the registered event on the workload cluster

Name:             ip-10-0-135-179.eu-west-2.compute.internal.17e25dcbd9a81aa1
Namespace:        default
Labels:           <none>
Annotations:      <none>
API Version:      v1
Count:            1
Event Time:       <nil>
First Timestamp:  2024-07-15T10:58:57Z
Involved Object:
  API Version:   v1
  Kind:          Node
  Name:          ip-10-0-135-179.eu-west-2.compute.internal
  UID:           85c04d10-ca6c-40e9-bf6e-86cd1762c026
Kind:            Event
Last Timestamp:  2024-07-15T10:58:57Z
Message:         Node ip-10-0-135-179.eu-west-2.compute.internal event: Registered Node ip-10-0-135-179.eu-west-2.compute.internal in Controller
Metadata:
  Creation Timestamp:  2024-07-15T10:58:57Z
  Resource Version:    16429
  UID:                 e949ab3e-d9dc-47b5-ba48-a9699f6ca68f
Reason:                RegisteredNode
Reporting Component:   node-controller
Reporting Instance:    
Source:
  Component:  node-controller
Type:         Normal
Events:       <none>

sbueringer · 2024-07-16T06:22:54Z

None of our controllers are watching on these event objects. They are using the watch API. Please see https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes

fabriziopandini · 2024-07-16T07:17:12Z

+1 to investigate why existing watches on Node objects are not working as expected

paurosello · 2024-07-16T11:06:12Z

Have some additional information here:

New node created (ip-10-0-204-6.eu-west-2.compute.internal, aws:///eu-west-2c/i-088c9f73c6f2ae2dc) at Tue, 16 Jul 2024 12:51:29 +0200

We can see multiple reconciliation loops:

| 2024-07-16 12:52:47.642 | I0716 10:52:47.642992       1 machinepool_controller_noderef.go:72] "node_taint_issue MachinePool has the required number of NodeRefs" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="a57c52d5-d5b4-4955-84f6-d10a9012fc84" Cluster="org-giantswarm/pau06" NodeRefs=5 |  
| 2024-07-16 12:52:47.642 | I0716 10:52:47.642915       1 machinepool_controller_noderef.go:66] "node_taint_issue MachinePool Status" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="a57c52d5-d5b4-4955-84f6-d10a9012fc84" Cluster="org-giantswarm/pau06" NodeRefs={"nodeRefs":[{"kind":"Node","name":"ip-10-0-127-171.eu-west-2.compute.internal","uid":"85c3540b-8a63-4cbf-8cdd-75af17278d48","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-68-84.eu-west-2.compute.internal","uid":"403a48e8-1f12-4dfd-b843-73f9dc157fc6","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-239-183.eu-west-2.compute.internal","uid":"37c10c3b-ea47-4292-8ae3-5845272a0bc0","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-160-170.eu-west-2.compute.internal","uid":"6a6c4f88-d29a-4ca1-b8a5-1bc165e18841","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-144-226.eu-west-2.compute.internal","uid":"99ee0049-4d1a-40a1-bc97-68bfe3e2ebbd","apiVersion":"v1"}],"replicas":5,"readyReplicas":5,"availableReplicas":5,"phase":"Running","bootstrapReady":true,"infrastructureReady":true,"observedGeneration":18,"conditions":[{"type":"Ready","status":"True","lastTransitionTime":"2024-07-16T10:49:17Z"},{"type":"BootstrapReady","status":"True","lastTransitionTime":"2024-07-16T07:17:38Z"},{"type":"InfrastructureReady","status":"True","lastTransitionTime":"2024-07-16T07:17:41Z"},{"type":"ReplicasReady","status":"True","lastTransitionTime":"2024-07-16T10:49:17Z"}]} |  
| 2024-07-16 12:52:47.642 | I0716 10:52:47.642881       1 machinepool_controller_noderef.go:65] "node_taint_issue MachinePool Spec" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="a57c52d5-d5b4-4955-84f6-d10a9012fc84" Cluster="org-giantswarm/pau06" Spec={"clusterName":"pau06","replicas":5,"template":{"metadata":{},"spec":{"clusterName":"pau06","bootstrap":{"configRef":{"kind":"KubeadmConfig","namespace":"org-giantswarm","name":"pau06-nodepool0-0d449","apiVersion":"bootstrap.cluster.x-k8s.io/v1beta1"},"dataSecretName":"pau06-nodepool0-0d449"},"infrastructureRef":{"kind":"AWSMachinePool","namespace":"org-giantswarm","name":"pau06-nodepool0","apiVersion":"infrastructure.cluster.x-k8s.io/v1beta2"},"version":"v1.25.16"}},"minReadySeconds":0,"providerIDList":["aws:///eu-west-2a/i-04281ba43c2bdbd79","aws:///eu-west-2a/i-0712ed4ef7cafc9bb","aws:///eu-west-2c/i-091c973916182b381","aws:///eu-west-2b/i-091f25ac9f8876f4f","aws:///eu-west-2b/i-0988e25c080783176"]} |  
| 2024-07-16 12:52:47.642 | I0716 10:52:47.642848       1 machinepool_controller_noderef.go:51] "node_taint_issue Reconciling MachinePool NodeRefs" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="a57c52d5-d5b4-4955-84f6-d10a9012fc84" Cluster="org-giantswarm/pau06" cluster="pau06" machinepool="pau06-nodepool0"

| 2024-07-16 12:55:14.499 | I0716 10:55:14.499423       1 machinepool_controller_noderef.go:72] "node_taint_issue MachinePool has the required number of NodeRefs" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="d7a36e07-fa1b-444e-8196-f7bc6a37e36d" Cluster="org-giantswarm/pau06" NodeRefs=5 |  
| 2024-07-16 12:55:14.499 | I0716 10:55:14.499398       1 machinepool_controller_noderef.go:66] "node_taint_issue MachinePool Status" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="d7a36e07-fa1b-444e-8196-f7bc6a37e36d" Cluster="org-giantswarm/pau06" NodeRefs={"nodeRefs":[{"kind":"Node","name":"ip-10-0-127-171.eu-west-2.compute.internal","uid":"85c3540b-8a63-4cbf-8cdd-75af17278d48","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-68-84.eu-west-2.compute.internal","uid":"403a48e8-1f12-4dfd-b843-73f9dc157fc6","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-239-183.eu-west-2.compute.internal","uid":"37c10c3b-ea47-4292-8ae3-5845272a0bc0","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-160-170.eu-west-2.compute.internal","uid":"6a6c4f88-d29a-4ca1-b8a5-1bc165e18841","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-144-226.eu-west-2.compute.internal","uid":"99ee0049-4d1a-40a1-bc97-68bfe3e2ebbd","apiVersion":"v1"}],"replicas":5,"readyReplicas":5,"availableReplicas":5,"phase":"Running","bootstrapReady":true,"infrastructureReady":true,"observedGeneration":18,"conditions":[{"type":"Ready","status":"True","lastTransitionTime":"2024-07-16T10:49:17Z"},{"type":"BootstrapReady","status":"True","lastTransitionTime":"2024-07-16T07:17:38Z"},{"type":"InfrastructureReady","status":"True","lastTransitionTime":"2024-07-16T07:17:41Z"},{"type":"ReplicasReady","status":"True","lastTransitionTime":"2024-07-16T10:49:17Z"}]} |  
| 2024-07-16 12:55:14.499 | I0716 10:55:14.499364       1 machinepool_controller_noderef.go:65] "node_taint_issue MachinePool Spec" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="d7a36e07-fa1b-444e-8196-f7bc6a37e36d" Cluster="org-giantswarm/pau06" Spec={"clusterName":"pau06","replicas":5,"template":{"metadata":{},"spec":{"clusterName":"pau06","bootstrap":{"configRef":{"kind":"KubeadmConfig","namespace":"org-giantswarm","name":"pau06-nodepool0-0d449","apiVersion":"bootstrap.cluster.x-k8s.io/v1beta1"},"dataSecretName":"pau06-nodepool0-0d449"},"infrastructureRef":{"kind":"AWSMachinePool","namespace":"org-giantswarm","name":"pau06-nodepool0","apiVersion":"infrastructure.cluster.x-k8s.io/v1beta2"},"version":"v1.25.16"}},"minReadySeconds":0,"providerIDList":["aws:///eu-west-2a/i-04281ba43c2bdbd79","aws:///eu-west-2a/i-0712ed4ef7cafc9bb","aws:///eu-west-2c/i-091c973916182b381","aws:///eu-west-2b/i-091f25ac9f8876f4f","aws:///eu-west-2b/i-0988e25c080783176"]} |  
| 2024-07-16 12:55:14.499 | I0716 10:55:14.499328       1 machinepool_controller_noderef.go:51] "node_taint_issue Reconciling MachinePool NodeRefs" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="d7a36e07-fa1b-444e-8196-f7bc6a37e36d" Cluster="org-giantswarm/pau06" cluster="pau06" machinepool="pau06-nodepool0"

Finally at 12:56 is get detected:

| 2024-07-16 12:56:35.363 | I0716 10:56:35.363639       1 machinepool_controller_noderef.go:66] "node_taint_issue MachinePool Status" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="6d5c9c46-a764-4b53-8d9b-251b501cd453" Cluster="org-giantswarm/pau06" NodeRefs={"nodeRefs":[{"kind":"Node","name":"ip-10-0-127-171.eu-west-2.compute.internal","uid":"85c3540b-8a63-4cbf-8cdd-75af17278d48","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-68-84.eu-west-2.compute.internal","uid":"403a48e8-1f12-4dfd-b843-73f9dc157fc6","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-204-6.eu-west-2.compute.internal","uid":"4a96de7a-64bf-4511-a804-432cea488595","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-239-183.eu-west-2.compute.internal","uid":"37c10c3b-ea47-4292-8ae3-5845272a0bc0","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-160-170.eu-west-2.compute.internal","uid":"6a6c4f88-d29a-4ca1-b8a5-1bc165e18841","apiVersion":"v1"},{"kind":"Node","name":"ip-10-0-144-226.eu-west-2.compute.internal","uid":"99ee0049-4d1a-40a1-bc97-68bfe3e2ebbd","apiVersion":"v1"}],"replicas":6,"readyReplicas":6,"availableReplicas":6,"unavailableReplicas":-1,"phase":"Running","bootstrapReady":true,"infrastructureReady":true,"observedGeneration":19,"conditions":[{"type":"Ready","status":"False","severity":"Info","lastTransitionTime":"2024-07-16T10:56:35Z","reason":"WaitingForReplicasReady"},{"type":"BootstrapReady","status":"True","lastTransitionTime":"2024-07-16T07:17:38Z"},{"type":"InfrastructureReady","status":"True","lastTransitionTime":"2024-07-16T07:17:41Z"},{"type":"ReplicasReady","status":"False","severity":"Info","lastTransitionTime":"2024-07-16T10:56:35Z","reason":"WaitingForReplicasReady"}]} |  
| 2024-07-16 12:56:35.363 | I0716 10:56:35.363605       1 machinepool_controller_noderef.go:65] "node_taint_issue MachinePool Spec" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="6d5c9c46-a764-4b53-8d9b-251b501cd453" Cluster="org-giantswarm/pau06" Spec={"clusterName":"pau06","replicas":6,"template":{"metadata":{},"spec":{"clusterName":"pau06","bootstrap":{"configRef":{"kind":"KubeadmConfig","namespace":"org-giantswarm","name":"pau06-nodepool0-0d449","apiVersion":"bootstrap.cluster.x-k8s.io/v1beta1"},"dataSecretName":"pau06-nodepool0-0d449"},"infrastructureRef":{"kind":"AWSMachinePool","namespace":"org-giantswarm","name":"pau06-nodepool0","apiVersion":"infrastructure.cluster.x-k8s.io/v1beta2"},"version":"v1.25.16"}},"minReadySeconds":0,"providerIDList":["aws:///eu-west-2a/i-04281ba43c2bdbd79","aws:///eu-west-2a/i-0712ed4ef7cafc9bb","aws:///eu-west-2c/i-088c9f73c6f2ae2dc","aws:///eu-west-2c/i-091c973916182b381","aws:///eu-west-2b/i-091f25ac9f8876f4f","aws:///eu-west-2b/i-0988e25c080783176"]} | 
| 2024-07-16 12:56:35.363 | I0716 10:56:35.363570       1 machinepool_controller_noderef.go:51] "node_taint_issue Reconciling MachinePool NodeRefs" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="6d5c9c46-a764-4b53-8d9b-251b501cd453" Cluster="org-giantswarm/pau06" cluster="pau06" machinepool="pau06-nodepool0" |  
| 2024-07-16 12:56:35.282 | I0716 10:56:35.282875       1 machinepool_controller_noderef.go:241] "node_taint_issue Patching node to set annotations and drop taints" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="org-giantswarm/pau06-nodepool0" namespace="org-giantswarm" name="pau06-nodepool0" reconcileID="c91e110a-f8fd-45e3-bd2f-e3c123ba7cc3" Cluster="org-giantswarm/pau06" node name="ip-10-0-204-6.eu-west-2.compute.internal"

Based on what I can see, it seems like nodeRefs are not being updated soon enough and the code is following this path

cluster-api/exp/internal/controllers/machinepool_controller_noderef.go

Line 65 in 5b6043e

conditions.MarkTrue(mp, expv1.ReplicasReadyCondition)

until something updates the NodeRefs.

I see NodeRefs are updated later in the code in

cluster-api/exp/internal/controllers/machinepool_controller_noderef.go

Line 85 in 5b6043e

    
           nodeRefsResult, err := r.getNodeReferences(ctx, clusterClient, mp.Spec.ProviderIDList, mp.Spec.MinReadySeconds)

based on ProviderIDList

ProviderIDList is updated in

cluster-api/exp/internal/controllers/machinepool_controller_phases.go

Line 244 in 5b6043e

    
           func (r *MachinePoolReconciler) reconcileInfrastructure(ctx context.Context, cluster *clusterv1.Cluster, mp *expv1.MachinePool) (ctrl.Result, error) {

currently adding some logging here to figure out when this is called.

I also added some logging to nodeToMachinePool function that should be called from watchClusterNodes and it's never called.

paurosello · 2024-07-16T14:47:36Z

The current issue does not seem to be related to CAPI controller but the CAPA controller, this function is only ran every 10 minutes and is the one that update the ProviderIDList.

https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/c383c6ea5c7260b037c35df8244741606862b407/exp/controllers/awsmachinepool_controller.go#L205

fabriziopandini · 2024-07-16T16:31:48Z

Great job in triaging this issue!
Based on the evidences above, is it ok to close this issue (you should probably open a new one in CAPA)?

paurosello · 2024-07-16T16:43:41Z

Based on the evidences above, is it ok to close this issue (you should probably open a new one in CAPA)?

fabriziopandini · 2024-07-17T08:43:00Z

(I think it is a yes)
/close

k8s-ci-robot · 2024-07-17T08:43:05Z

@fabriziopandini: Closing this issue.

In response to this:

(I think it is a yes)
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

AndiDog · 2024-07-17T11:11:53Z

This CAPA problem is basically the same issue: kubernetes-sigs/cluster-api-provider-aws#4618

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/bootstrap Issues or PRs related to bootstrap providers needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 12, 2023

k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jan 19, 2024

fabriziopandini removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 6, 2024

paurosello mentioned this issue Mar 4, 2024

Karpenter GA for AWS Vintage and CAPI giantswarm/roadmap#2705

Open

k8s-ci-robot closed this as completed Apr 11, 2024

killianmuldoon reopened this Jun 27, 2024

k8s-ci-robot added needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 27, 2024

sbueringer added the area/machinepool Issues or PRs related to machinepools label Jul 2, 2024

fabriziopandini added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Jul 3, 2024

mweibel mentioned this issue Jul 12, 2024

MachinePool ready state leading to not processing providerIDs in CAPI kubernetes-sigs/cluster-api-provider-azure#4982

Open

k8s-ci-robot closed this as completed Jul 17, 2024

sbueringer mentioned this issue Aug 1, 2024

The MachinePool remains in the "ScalingUp" phase instead of transitioning to "Running" #10541

Open

squizzi mentioned this issue Sep 10, 2024

Nodes bootstrapped via aws-hosted-cp get stuck with node.cloudprovider.kubernetes.io/uninitialized taint Mirantis/hmc#290

Closed

mweibel mentioned this issue Nov 19, 2024

MachinePool always update vmssState kubernetes-sigs/cluster-api-provider-azure#5292

Open

4 tasks

CAPI is taking too long removing taint node.cluster.x-k8s.io/uninitialized:NoSchedule from nodes #9858

CAPI is taking too long removing taint node.cluster.x-k8s.io/uninitialized:NoSchedule from nodes #9858

Comments

njuettner commented Dec 12, 2023 • edited Loading

What steps did you take and what happened?

What did you expect to happen?

Cluster API version

Kubernetes version

Anything else you would like to add?

Label(s) to be applied

k8s-ci-robot commented Dec 12, 2023

AverageMarcus commented Dec 12, 2023

chrischdi commented Dec 12, 2023

chrischdi commented Dec 12, 2023

fabriziopandini commented Jan 19, 2024

fabriziopandini commented Apr 11, 2024

k8s-ci-robot commented Apr 11, 2024

mweibel commented Jun 26, 2024

k8s-ci-robot commented Jun 26, 2024

killianmuldoon commented Jun 27, 2024

k8s-ci-robot commented Jun 27, 2024

chrischdi commented Jun 27, 2024

mweibel commented Jun 27, 2024

chrischdi commented Jun 27, 2024

sbueringer commented Jul 2, 2024

sbueringer commented Jul 3, 2024

sbueringer commented Jul 3, 2024 • edited Loading

sbueringer commented Jul 11, 2024 • edited Loading

paurosello commented Jul 11, 2024

sbueringer commented Jul 11, 2024 • edited Loading

paurosello commented Jul 11, 2024

mweibel commented Jul 12, 2024 • edited Loading

paurosello commented Jul 15, 2024

sbueringer commented Jul 15, 2024 • edited Loading

paurosello commented Jul 15, 2024 • edited Loading

chrischdi commented Jul 15, 2024

paurosello commented Jul 15, 2024 • edited Loading

sbueringer commented Jul 15, 2024 • edited Loading

paurosello commented Jul 15, 2024

sbueringer commented Jul 15, 2024 • edited Loading

paurosello commented Jul 15, 2024

sbueringer commented Jul 15, 2024 • edited Loading

paurosello commented Jul 15, 2024 • edited Loading

sbueringer commented Jul 16, 2024

fabriziopandini commented Jul 16, 2024

paurosello commented Jul 16, 2024 • edited Loading

paurosello commented Jul 16, 2024

fabriziopandini commented Jul 16, 2024

paurosello commented Jul 16, 2024 via email • edited Loading

fabriziopandini commented Jul 17, 2024

k8s-ci-robot commented Jul 17, 2024

AndiDog commented Jul 17, 2024

njuettner commented Dec 12, 2023 •

edited

Loading

sbueringer commented Jul 3, 2024 •

edited

Loading

sbueringer commented Jul 11, 2024 •

edited

Loading

sbueringer commented Jul 11, 2024 •

edited

Loading

mweibel commented Jul 12, 2024 •

edited

Loading

sbueringer commented Jul 15, 2024 •

edited

Loading

paurosello commented Jul 15, 2024 •

edited

Loading

paurosello commented Jul 15, 2024 •

edited

Loading

sbueringer commented Jul 15, 2024 •

edited

Loading

sbueringer commented Jul 15, 2024 •

edited

Loading

sbueringer commented Jul 15, 2024 •

edited

Loading

paurosello commented Jul 15, 2024 •

edited

Loading

paurosello commented Jul 16, 2024 •

edited

Loading

paurosello commented Jul 16, 2024 via email •

edited

Loading