Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vCluster creating more trouble than helping(due to different causes) #1787

Open
MichaelKora opened this issue May 22, 2024 · 14 comments
Open
Labels

Comments

@MichaelKora
Copy link

MichaelKora commented May 22, 2024

What happened?

I honestly if it is supposed to be that hard but vcluster is creating more trouble than solutions...i've been working on getting a prod ready cluster for over a week and its not working. Right now the cluster is up and running and connect to it using NodePort Service... the issues:

  1. When i deploy the cluster, it takes over 60min for the cluster Pods to be running in healthy state , so i can communicate with the vcluster
  2. the coredns pod though in state running is full of errors:
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized

[INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: failed to list *v1.Service: Unauthorized 
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized 

[INFO] plugin/kubernetes: Trace[944124959]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (24-May-2024 11:55:04.692) (total time: 16726ms): Trace[944124959]: ---"Objects listed" error:<nil> 16726ms (11:55:21.419)     Trace[944124959]: [16.726980037s] [16.726980037s] END                                                                                                                                                                                    

[INFO] plugin/kubernetes: Trace[1216864093]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (24-May-2024 11:54:50.867) (total time: 30559ms): = Trace[1216864093]: ---"Objects listed" error:<nil> 30559ms (11:55:21.426)    Trace[1216864093]: [30.55934034s] [30.55934034s] END                                                                                                                                                                                     

[INFO] plugin/kubernetes: Trace[1087029931]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (24-May-2024 11:54:49.127) (total time: 32304ms):  Trace[1087029931]: ---"Objects listed" error:<nil> 32304ms (11:55:21.432)                                                                                                                                                            Trace[1087029931]: [32.304771091s] [32.304771091s] END 
  1. When connected to the vcluster, requests delivers different responses each time:
    EG: Running kubectl get namespaces might show 4 Namespaces; then 4, and then 6 etc.

  2. Running helm on the vcluster is nearly impossible..it times out nearly every single time..

This is all bordering because i was expecting vCluster way easier to use.

What did you expect to happen?

i deployed a cluster and expected the coreDNS pods to be deployed

How can we reproduce it (as minimally and precisely as possible)?

# vcluster.yaml
exportKubeConfig:
  context: "sharedpool-context"
controlPlane:
  coredns:
    enabled: true
    embedded: false
    deployment:
      replicas: 2
      nodeSelector:
        workload: wk1
  statefulSet:
    highAvailability:
      replicas: 2
    persistence:
      volumeClaim:
        enabled: true
    scheduling:
      nodeSelector:
        workload: wk1
    resources:
      limits:
        ephemeral-storage: 20Gi
        memory: 10Gi
      requests:
        ephemeral-storage: 200Mi
        cpu: 200m
        memory: 256Mi
    
  proxy:
    bindAddress: "0.0.0.0"
    port: 8443
    extraSANs:
      - XX.XX.XX.XXX
      - YY.YY.YY.YYY
helm upgrade -i my-vcluster vcluster \
  --repo https://charts.loft.sh \
  --namespace vcluster-ns --create-namespace \
  --repository-config='' \
  -f vcluster.yaml \
  --version 0.20.0-beta.5

Anything else we need to know?

i used a nodePort service to connect to the cluster

# nodeport.yaml

apiVersion: v1
kind: Service
metadata:
  name: vcluster-nodeport
  namespace: vcluster-ns
spec:
  selector:
    app: vcluster
    release: shared-pool-vcluster
  ports:
    - name: https
      port: 443
      targetPort: 8443
      protocol: TCP
      nodePort: 31222
  type: NodePort
$ helm upgrade -i solr-operator apache-solr/solr-operator --version 0.8.1 -n solr-cloud

Release "solr-operator" does not exist. Installing it now.
Error: failed post-install: 1 error occurred:
        * timed out waiting for the condition


$ helm upgrade -i hz-operator hazelcast/hazelcast-platform-operator -n hz-vc-ns --create-namespace -f operator.yaml

Release "hz-operator" does not exist. Installing it now.
Error: 9 errors occurred:
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Timeout: request did not complete within requested timeout - context deadline exceeded
        * Internal error occurred: resource quota evaluation timed out

Host cluster Kubernetes version

$ kubectl version
# paste output here

Host cluster Kubernetes distribution

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.11

vlcuster version

$ vcluster --version
vcluster version 0.19.5

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

default..
# i did no specify a speciffc distribution

OS and Arch

OS: talos
Arch: metal-amd64

@MichaelKora MichaelKora changed the title CoreDNS pods not deployed CoreDNS pods not working May 24, 2024
@MichaelKora MichaelKora reopened this May 24, 2024
@MichaelKora MichaelKora changed the title CoreDNS pods not working vCluster creating more trouble than helping(due to different causes) May 24, 2024
@heiko-braun
Copy link
Contributor

hey @MichaelKora , it's unfortunate that have to experience these troubles.

One thing that I'd recommend is to use the latest vcluster CLI, together with 0.20.0-beta.5. From the description it appears that you using the 0.19.5 one instead.

Regarding the other issues: It's a bit hard to say from the outset what might causing your issues. You seem to be leveraging Talos. What Kubernetes distro is running on top of it?

@MichaelKora
Copy link
Author

hey @heiko-braun 0.19.5 is the latest according to vcluster cli

$ sudo vcluster upgrade
15:55:48 info Current binary is the latest version: 0.19.5

i have a TalosRunning there( the default image) its based on k3s

@heiko-braun
Copy link
Contributor

heiko-braun commented Jun 5, 2024

Hi @MichaelKora, you can get the latest CLI (the one to be used with 0.20 vcluster.yaml) here:
https://github.com/loft-sh/vcluster/releases/tag/v0.20.0-beta.6

@heiko-braun
Copy link
Contributor

@MichaelKora the hazelcast and solr examples in the description, did you run the commands against the host cluster or the virtual one?

@MichaelKora
Copy link
Author

@heiko-braun thanks for your response I run the command against the vcluster... when run against the host cluster, I have no issues

@MichaelKora
Copy link
Author

MichaelKora commented Jun 5, 2024

@heiko-braun when the cluster is being created, the logs show:

 2024-06-05 14:38:37 INFO    setup/controller_context.go:196    couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": dial tcp 127.0.0.1:6443: connect: connection refused), will retry in 1 seconds    {"component": "vcluster"} 

 2024-06-05 14:38:38    INFO    setup/controller_context.go:196    couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": dial tcp 127.0.0.1:6443: connect: connection refused), wil l retry in 1 seconds    {"component": "vcluster"}

 2024-06-05 14:38:39    INFO    setup/controller_context.go:196    couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": dial tcp 127.0.0.1:6443: connect: connection refused), will retry in 1 seconds    {"component": "vcluster"}

 2024-06-05 14:38:40    INFO    commandwriter/commandwriter.go:126    error retrieving resource lock kube-system/kube-controller-manager: Get "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 127.0.0.1:6443: connect: connection refused    {"component": "vcluster", "component": "controller-manager", "location": "leaderelection.go:33 2"}

and it takes more than 60min before bringing the cluster to an healthy state..that seems verry odd to me that it takes that long to create a virt cluster

@everflux
Copy link

@MichaelKora how many nodes does your host cluster have, and what capacity? do you use network policies?

@MichaelKora
Copy link
Author

hey @everflux i dedicated 2nodes of the host cluster to the vcluster...8cpu/32GB..i am not using any restrictive network policies

@everflux
Copy link

This sounds like a setup problem to me, either with the host cluster or vcluster. Did you try to setup one or multiple vclusters? (check kubectl get all -n vcluster-ns, kubectl get ns)
I am afraid a github issue might not be the right place to discuss this, perhaps the slack channel would be better suited.

@deniseschannon
Copy link
Contributor

@MichaelKora Are you still having issues or were you able to resolve them?

@MichaelKora
Copy link
Author

hey @deniseschannon, yes i am still having the issue!

@MichaelKora
Copy link
Author

MichaelKora commented Sep 6, 2024

This sounds like a setup problem to me, either with the host cluster or vcluster. Did you try to setup one or multiple vclusters? (check kubectl get all -n vcluster-ns, kubectl get ns) I am afraid a github issue might not be the right place to discuss this, perhaps the slack channel would be better suited.

@everflux i have just one setup

@MichaelKora
Copy link
Author

@deniseschannon @heiko-braun @everflux Any update on the origin of that issue and how to fix it?

@everflux
Copy link

I think the slack channel or direct consulting is a better place for support than a github issue in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants