Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EtcdMember reconciler cannot update status when deleting etcd leader #5161

Open
4 tasks done
apedriza opened this issue Oct 28, 2024 · 1 comment
Open
4 tasks done
Labels
bug Something isn't working Stale
Milestone

Comments

@apedriza
Copy link
Contributor

apedriza commented Oct 28, 2024

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.10.4-linuxkit #1 SMP Wed Oct 2 16:38:00 UTC 2024 aarch64 GNU/Linux
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Version

v1.31.1+k0s.0

Sysinfo

`k0s sysinfo`
Total memory: 7.7 GiB (pass)
File system of /var/lib/k0s: ext4 (pass)
Disk space available for /var/lib/k0s: 10.0 GiB (pass)
Relative disk space available for /var/lib/k0s: 17% (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.10.4-linuxkit (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: built-in (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: built-in (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: built-in (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: built-in (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: built-in (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: built-in (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: built-in (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: built-in (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: built-in (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: built-in (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: built-in (pass)
      CONFIG_NETFILTER_NETLINK: built-in (pass)
      CONFIG_NF_NAT: built-in (pass)
      CONFIG_IP_SET: IP set support: built-in (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: built-in (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: built-in (pass)
      CONFIG_IP_VS: IP virtual server support: built-in (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: built-in (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: built-in (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: built-in (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: built-in (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: built-in (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: built-in (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: built-in (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: built-in (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: built-in (pass)
      CONFIG_NF_DEFRAG_IPV4: built-in (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: built-in (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: built-in (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: built-in (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: built-in (pass)
      CONFIG_NF_DEFRAG_IPV6: built-in (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: built-in (pass)
      CONFIG_LLC: built-in (pass)
      CONFIG_STP: built-in (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

When a EtcdMember related to the etcd leader (which is the one can act as etcdmember-reconciler) is mark as leave, reconcile process cannot update EtcdMember status because at that point is not a member of etcd anymore.

Steps to reproduce

  1. Run `k0s kubectl patch etcdmembers --type='merge' -p '{"spec":{"leave":true}}'
  2. check etcdmembers status: k0s kubectl get etcdmembers
  3. See etcdmember still marked as Joined:
    Image

Expected behavior

I should be able to remove EtcdMember even it is the etcd leader.

Actual behavior

No response

Screenshots and logs

Oct 28 11:15:23 docker-test-cluster-docker-test-0 k0s[498]: time="2024-10-28 11:15:23" level=debug msg="reconciling EtcdMember: &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:docker-test-0 GenerateName: Namespace: SelfLink: UID:826fde62-9f1d-4c8e-834c-345c0329f673 ResourceVersion:640 Generation:2 CreationTimestamp:2024-10-28 11:14:09 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ManagedFields:[{Manager:k0s Operation:Update APIVersion:etcd.k0sproject.io/v1beta1 Time:2024-10-28 11:14:09 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:spec\":{}} Subresource:} {Manager:k0s Operation:Update APIVersion:etcd.k0sproject.io/v1beta1 Time:2024-10-28 11:14:09 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:status\":{\".\":{},\"f:conditions\":{\".\":{},\"k:{\\\"type\\\":\\\"Joined\\\"}\":{\".\":{},\"f:lastTransitionTime\":{},\"f:message\":{},\"f:status\":{},\"f:type\":{}}},\"f:memberID\":{},\"f:peerAddress\":{}}} Subresource:status} {Manager:kubectl-patch Operation:Update APIVersion:etcd.k0sproject.io/v1beta1 Time:2024-10-28 11:15:23 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:spec\":{\"f:leave\":{}}} Subresource:}]} Status:{PeerAddress:172.18.0.4 MemberID:5320353a7d98bdee ReconcileStatus: Message: Conditions:[{Type:Joined Status:True LastTransitionTime:2024-10-28 11:14:09 +0000 UTC Message:Member joined}]} Spec:{Leave:true}}" component=etcdMemberReconciler memberID=5320353a7d98bdee name=docker-test-0 peerAddress=172.18.0.4 phase=reconcile
Oct 28 11:15:23 docker-test-cluster-docker-test-0 k0s[498]: time="2024-10-28 11:15:23" level=info msg="reconcile succeeded" component=etcdMemberReconciler memberID=5320353a7d98bdee name=docker-test-0 peerAddress=172.18.0.4 phase=reconcile
Oct 28 11:15:23 docker-test-cluster-docker-test-0 k0s[498]: time="2024-10-28 11:15:23" level=error msg="failed to update EtcdMember status" component=etcdMemberReconciler error="rpc error: code = Unknown desc = raft: stopped" memberID=5320353a7d98bdee name=docker-test-0 peerAddress=172.18.0.4 phase=reconcile

Additional context

trigger a leader election at the etcdmember reconciler may be an approach to solve this bug.
No response

@apedriza apedriza added the bug Something isn't working label Oct 28, 2024
@jnummelin jnummelin added this to the 1.32 milestone Oct 28, 2024
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

2 participants