-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupted filesystem on a PVC with fsRepair=false will not prevent the pod from starting #416
Comments
Thanks for reporting this issue. I'll forward it to engineering. |
@valleedelisle can you please provide logs? May I know if the the CSI driver reported mount failure when this pod was created? |
Can we also get the |
Here's some of the logs I have. Pod is minio and the operation I was trying to achieve is to restart it ( [1] Pod creation Now that I'm deepdiving the logs, it looks like the detach wasn't completed when the attach was started and this is probably what caused these IO errors. Also, we hit the I believe that, in those cases, we should failed the pod. For reference:
And here's the allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hpe-minio
parameters:
accessProtocol: iscsi
csi.storage.k8s.io/controller-expand-secret-name: xxx-alletra6k-01
csi.storage.k8s.io/controller-expand-secret-namespace: hpe-storage
csi.storage.k8s.io/controller-publish-secret-name: xxx-alletra6k-01
csi.storage.k8s.io/controller-publish-secret-namespace: hpe-storage
csi.storage.k8s.io/fstype: xfs
csi.storage.k8s.io/node-publish-secret-name: xxx-alletra6k-01
csi.storage.k8s.io/node-publish-secret-namespace: hpe-storage
csi.storage.k8s.io/node-stage-secret-name: xxx-alletra6k-01
csi.storage.k8s.io/node-stage-secret-namespace: hpe-storage
csi.storage.k8s.io/provisioner-secret-name: xxx-alletra6k-01
csi.storage.k8s.io/provisioner-secret-namespace: hpe-storage
dedupeEnabled: "false"
description: Volume created by the HPE CSI Driver for Minio
encrypted: "false"
folder: k8s-minio
fsRepair: "true"
performancePolicy: ceph
provisioner: csi.hpe.com
reclaimPolicy: Delete
volumeBindingMode: Immediate [1]
[2]
[3]
[4]
[5]
|
I've spent some time on this today and I'm unable to reproduce this with the methods we used to test the fsRepair feature. Do you mind sharing the manifests for your workload? It looks like you're using minio, is that controlled by a StatefulSet or a Deployment? |
Sure, it's a statefulset [1]. Before this crash, we had 13 million small objects in there. We were hitting this issue when we restarted the pods so we had to apply this workaround. We're also passing some SR-IOV virtual function for internode traffic mostly but that shouldn't change anything here since the volume is connected on another NIC, from the host. [1] kind: StatefulSet
apiVersion: apps/v1
metadata:
name: minio
namespace: minio
labels:
helm.sh/chart: minio-14.3.2
spec:
serviceName: minio-headless
revisionHistoryLimit: 10
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
volumeClaimTemplates:
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-0
creationTimestamp: null
labels:
app.kubernetes.io/instance: minio
app.kubernetes.io/name: minio
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1700Gi
storageClassName: hpe-minio
volumeMode: Filesystem
status:
phase: Pending
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-1
creationTimestamp: null
labels:
app.kubernetes.io/instance: minio
app.kubernetes.io/name: minio
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1700Gi
storageClassName: hpe-minio
volumeMode: Filesystem
status:
phase: Pending
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: minio
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: minio
app.kubernetes.io/version: 2024.4.28
helm.sh/chart: minio-14.3.2
annotations:
k8s.v1.cni.cncf.io/networks: |-
[
{
"name": "internode-minio",
"namespace": "minio",
"capabilities": { "ips": true }
},
{
"name": "s3-minio",
"namespace": "minio",
"capabilities": { "ips": true }
}
]
spec:
restartPolicy: Always
serviceAccountName: minio
schedulerName: default-scheduler
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: minio
app.kubernetes.io/name: minio
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 30
securityContext:
seLinuxOptions:
type: spc_t
fsGroup: 1001
fsGroupChangePolicy: OnRootMismatch
containers:
- resources:
limits:
cpu: '16'
ephemeral-storage: 1Gi
memory: 128Gi
requests:
cpu: '16'
ephemeral-storage: 1Gi
memory: 128Gi
readinessProbe:
tcpSocket:
port: minio-api
initialDelaySeconds: 45
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 5
terminationMessagePath: /dev/termination-log
name: minio
livenessProbe:
httpGet:
path: /minio/health/live
port: minio-api
scheme: HTTP
initialDelaySeconds: 45
timeoutSeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 5
env:
- name: BITNAMI_DEBUG
value: 'false'
- name: MINIO_DISTRIBUTED_MODE_ENABLED
value: 'yes'
- name: MINIO_DISTRIBUTED_NODES
value: 'minio-{0...5}.minio-headless.minio.svc.cluster.local:9000/bitnami/minio/data-{0...1}'
- name: MINIO_SCHEME
value: http
- name: MINIO_FORCE_NEW_KEYS
value: 'no'
- name: MINIO_ROOT_USER
valueFrom:
secretKeyRef:
name: minio
key: root-user
- name: MINIO_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: minio
key: root-password
- name: MINIO_SKIP_CLIENT
value: 'yes'
- name: MINIO_BROWSER
value: 'on'
- name: MINIO_PROMETHEUS_AUTH_TYPE
value: public
- name: MINIO_DATA_DIR
value: /bitnami/minio/data-0
- name: MINIO_BROWSER_SESSION_DURATION
value: 365d
- name: MINIO_SCANNER_SPEED
value: slowest
- name: NFV_IPPOOL
value: 192.168.0.0-16
- name: NFV_SLEEP
value: '45'
- name: NFV_ARGS
value: 'http://minio-{0...5}-data:9000/bitnami/minio/data-{0...1}'
securityContext:
runAsGroup: 1001
runAsUser: 0
seccompProfile:
type: RuntimeDefault
readOnlyRootFilesystem: false
runAsNonRoot: false
privileged: true
capabilities:
drop:
- ALL
seLinuxOptions:
type: spc_t
allowPrivilegeEscalation: true
ports:
- name: minio-api
containerPort: 9000
protocol: TCP
- name: minio-console
containerPort: 9001
protocol: TCP
imagePullPolicy: IfNotPresent
volumeMounts:
- name: empty-dir
mountPath: /tmp
subPath: tmp-dir
- name: empty-dir
mountPath: /opt/bitnami/minio/tmp
subPath: app-tmp-dir
- name: empty-dir
mountPath: /.mc
subPath: app-mc-dir
- name: data-0
mountPath: /bitnami/minio/data-0
- name: data-1
mountPath: /bitnami/minio/data-1
- name: minio-creds
readOnly: true
mountPath: /minio-creds-config.json
subPath: config.json
- name: minio-run
readOnly: true
mountPath: /opt/bitnami/scripts/minio/run.sh
subPath: run.sh
- name: minio-update-host
readOnly: true
mountPath: /opt/bitnami/scripts/minio-update-host.sh
subPath: minio-update-host.sh
- name: libminio
readOnly: true
mountPath: /opt/bitnami/scripts/libminio.sh
subPath: libminio.sh
terminationMessagePolicy: File
envFrom:
- secretRef:
name: minio-extra
image: 'minio:2024.4.28-debian-12-r0'
automountServiceAccountToken: true
serviceAccount: minio
volumes:
- name: empty-dir
emptyDir: {}
- name: minio-creds
secret:
secretName: minio-creds
items:
- key: config.json
path: config.json
defaultMode: 420
- name: minio-run
configMap:
name: minio-run
items:
- key: minio-run.sh
path: run.sh
defaultMode: 511
- name: libminio
configMap:
name: minio-run
items:
- key: libminio.sh
path: libminio.sh
defaultMode: 511
- name: minio-update-host
configMap:
name: minio-run
items:
- key: minio-update-host.sh
path: minio-update-host.sh
defaultMode: 511
dnsPolicy: ClusterFirst
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
podManagementPolicy: Parallel
replicas: 6
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app.kubernetes.io/instance: minio
app.kubernetes.io/name: minio
status:
observedGeneration: 43
availableReplicas: 6
updateRevision: minio-5ccdcd5c7
currentRevision: minio-5ccdcd5c7
currentReplicas: 6
updatedReplicas: 6
replicas: 6
collisionCount: 0
readyReplicas: 6 |
The pod will start and instead of having a volume, it will write directly in the ephemeral folder: (ex:
/var/lib/kubelet/pods/28e492a8-e01e-41fb-a8e0-06b9dce20db1/volumes/kubernetes.io~csi/pvc-d72a0458-d8aa-4ed4-b503-f38ac4d2914a/mount
).If the volume is not mountable, the pod shouldn't write in the ephemeral storage, it simply shouldn't start.
The text was updated successfully, but these errors were encountered: