You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In unplanned node down + node removal scenario (with possible destructive force-deletion of pods and their metadata?), all the related mount points of all PVCs on this node including the k8s device global path mountpoints are also cleaned up. But, after the node was turned on again and added back to the k8s cluster, when OBS pods with PVCs initialzed, kubelet on this node directly issued wrong CSI call NodePublishVolume inconsistent with volumes' real status with the skip of required successful CSI call NodeStageVolume that mounts volumes' k8s device global path with the real device path. As a result, CSI volumes would be turned to and stuck in Failed Status at this time. This wrong behavior violates the requirement of CSI spec https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume
However, before k8s community fixed this kubelet issue instantly, I think we need to consider changing code on CSI side to give a workround fix to handle the case here resulted from the kubelet issue.
Environment (please complete the following information):
Rke2
To Reproduce
Unplanned directly power off the node, and then try to remove the node with forceful deletion of pods and their metadata
Expected behavior
When the pods with PVCs initialized, for each volume, kubelet should issue NodeStageVolume CSI call and then, after the successful completion of NodeStageVolume, issue NodePublishVolume CSI call.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
CraneShiEMC
changed the title
Need to Handle Kubelet's Problematic CSI Call Inconsistent with Real Volume Status
Need to Handle Kubelet's Wrong CSI Call Inconsistent with Real Volume Status
Aug 27, 2023
Describe the bug
In unplanned node down + node removal scenario (with possible destructive force-deletion of pods and their metadata?), all the related mount points of all PVCs on this node including the k8s device global path mountpoints are also cleaned up. But, after the node was turned on again and added back to the k8s cluster, when OBS pods with PVCs initialzed, kubelet on this node directly issued wrong CSI call NodePublishVolume inconsistent with volumes' real status with the skip of required successful CSI call NodeStageVolume that mounts volumes' k8s device global path with the real device path. As a result, CSI volumes would be turned to and stuck in Failed Status at this time. This wrong behavior violates the requirement of CSI spec https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume
However, before k8s community fixed this kubelet issue instantly, I think we need to consider changing code on CSI side to give a workround fix to handle the case here resulted from the kubelet issue.
Environment (please complete the following information):
Rke2
To Reproduce
Unplanned directly power off the node, and then try to remove the node with forceful deletion of pods and their metadata
Expected behavior
When the pods with PVCs initialized, for each volume, kubelet should issue NodeStageVolume CSI call and then, after the successful completion of NodeStageVolume, issue NodePublishVolume CSI call.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: