Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to Handle Kubelet's Wrong CSI Call Inconsistent with Real Volume Status #1051

Open
CraneShiEMC opened this issue Aug 26, 2023 · 0 comments · May be fixed by #1050
Open

Need to Handle Kubelet's Wrong CSI Call Inconsistent with Real Volume Status #1051

CraneShiEMC opened this issue Aug 26, 2023 · 0 comments · May be fixed by #1050
Labels
bug Something isn't working

Comments

@CraneShiEMC
Copy link
Collaborator

CraneShiEMC commented Aug 26, 2023

Describe the bug
In unplanned node down + node removal scenario (with possible destructive force-deletion of pods and their metadata?), all the related mount points of all PVCs on this node including the k8s device global path mountpoints are also cleaned up. But, after the node was turned on again and added back to the k8s cluster, when OBS pods with PVCs initialzed, kubelet on this node directly issued wrong CSI call NodePublishVolume inconsistent with volumes' real status with the skip of required successful CSI call NodeStageVolume that mounts volumes' k8s device global path with the real device path. As a result, CSI volumes would be turned to and stuck in Failed Status at this time. This wrong behavior violates the requirement of CSI spec https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume 

However, before k8s community fixed this kubelet issue instantly, I think we need to consider changing code on CSI side to give a workround fix to handle the case here resulted from the kubelet issue.

Environment (please complete the following information):
Rke2

To Reproduce
Unplanned directly power off the node, and then try to remove the node with forceful deletion of pods and their metadata

Expected behavior
When the pods with PVCs initialized, for each volume, kubelet should issue NodeStageVolume CSI call and then, after the successful completion of NodeStageVolume, issue NodePublishVolume CSI call.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@CraneShiEMC CraneShiEMC added the bug Something isn't working label Aug 26, 2023
@CraneShiEMC CraneShiEMC changed the title Need to Handle Kubelet's Problematic CSI Call Inconsistent with Real Volume Status Need to Handle Kubelet's Wrong CSI Call Inconsistent with Real Volume Status Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant