Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.9.0] Can't start (or update) a cluster #1064

Closed
Mokto opened this issue Sep 26, 2023 · 8 comments
Closed

[1.9.0] Can't start (or update) a cluster #1064

Mokto opened this issue Sep 26, 2023 · 8 comments
Assignees
Labels
bug Something isn't working done Issues in the state 'done'

Comments

@Mokto
Copy link

Mokto commented Sep 26, 2023

What happened?
I can't start new clusters (or update existing ones) with version 2.9.0 1.9.0

How to reproduce it (as minimally and precisely as possible):
Start a new operator or update an existing one. Create a cluster or let one of your clusters update. It will fail.

I think The "server-config-init" initContainer is using the docker container "k8ssandra/k8ssandra-client:v0.2.0" even though it should use "datastax/cass-config-builder:1.0-ubi7"

Error: open /cassandra-base-config/cassandra-env.sh: no such file or directory
Usage:
  k8ssandra config build [flags]

Examples:

	# Process the config files from cass-operator input
	kubectl k8ssandra config build [<args>]


Flags:
      --as string                      Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
      --as-group stringArray           Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
      --as-uid string                  UID to impersonate for the operation.
      --cache-dir string               Default cache directory (default "/.kube/cache")
      --certificate-authority string   Path to a cert file for the certificate authority
      --client-certificate string      Path to a client certificate file for TLS
      --client-key string              Path to a client key file for TLS
      --cluster string                 The name of the kubeconfig cluster to use
      --context string                 The name of the kubeconfig context to use
      --disable-compression            If true, opt-out of response compression for all requests to the server
  -h, --help                           help for build
      --input string                   read config files from this directory instead of default
      --insecure-skip-tls-verify       If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
      --kubeconfig string              Path to the kubeconfig file to use for CLI requests.
  -n, --namespace string               If present, the namespace scope for this CLI request
      --output string                  write config files to this directory instead of default
      --request-timeout string         The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests. (default "0")
  -s, --server string                  The address and port of the Kubernetes API server
      --tls-server-name string         Server name to use for server certificate validation. If it is not provided, the hostname used to contact the server is used
      --token string                   Bearer token for authentication to the API server
      --user string                    The name of the kubeconfig user to use

Environment

  • K8ssandra Operator version:
    2.9.0

  • Kubernetes version information:
    1.27.3

  • Kubernetes cluster kind:
    GKE

  • Manifests:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: production
spec:
  cassandra:
    serverVersion: "4.1.1"
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: standard-rwo
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
    config:
      jvmOptions:
        heapSize: 1Gi
    datacenters:
      - metadata:
          name: dc1
        size: 7
        resources:
          requests:
            cpu: 1
            memory: 35Gi
          limits:
            memory: 35Gi
        config:
          jvmOptions:
            gc: G1GC
            heapSize: 12Gi
        tolerations:
        - key: "databases"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
        racks:
        - name: default
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: oceanPool
                    operator: In
                    values:
                    - databases
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchLabels:
                    cassandra.datastax.com/cluster: production
                topologyKey: kubernetes.io/hostname
        storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: standard-rwo
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 750Gi

  reaper:
    autoScheduling:
      enabled: true
  medusa:
    storageProperties:
      storageProvider: google_storage
      storageSecretRef:
        name: xxxxxxxxxxxx
      bucketName: xxxxxxxxxxxxxxx
      prefix: xxxxxxxxxxxxxxxxxxxxx
      region: xxxxxxxxxxxxxxxx
      secure: false 
      maxBackupCount: 25

  • K8ssandra Operator Logs:
insert K8ssandra Operator logs relevant to the issue here

Anything else we need to know?:

When using datastax/cass-config-builder:1.0-ubi7 is starts properly

@Mokto Mokto added the bug Something isn't working label Sep 26, 2023
@burmanm
Copy link
Contributor

burmanm commented Sep 26, 2023

I think The "server-config-init" initContainer is using the docker container "k8ssandra/k8ssandra-client:v0.2.0" even though it should use "datastax/cass-config-builder:1.0-ubi7"

No, it should use k8ssandra-client as the target Cassandra version is 4.1.x.

/cassandra-base-config/ should be a mount that's created for the pods, it should be visible in the Pods (and StatefulSet's PodTemplateSpec) as VolumeMount "server-config-base".

@adejanovski
Copy link
Contributor

Hi @Mokto, I'm testing your configuration and will report back with my findings.
So far, I got it to deploy when I removed the Medusa section. I'll update the Medusa e2e test with 4.1.1 to see if there are some interactions due to Medusa's presence.

@adejanovski adejanovski changed the title [2.9.0] Can't start (or update) a cluster [1.9.0] Can't start (or update) a cluster Sep 26, 2023
@adejanovski
Copy link
Contributor

@Mokto, we were able to reproduce the bug.
We'll fix it ASAP and ship a v1.9.1 shortly.

@adejanovski adejanovski moved this to In Progress in K8ssandra Sep 26, 2023
@adejanovski adejanovski added the in-progress Issues in the state 'in-progress' label Sep 26, 2023
@Mokto
Copy link
Author

Mokto commented Sep 26, 2023

Thanks!

@Mokto
Copy link
Author

Mokto commented Oct 9, 2023

I think we can close this with the 1.9.1 release.

@Mokto Mokto closed this as completed Oct 9, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in K8ssandra Oct 9, 2023
@adejanovski adejanovski added done Issues in the state 'done' and removed in-progress Issues in the state 'in-progress' labels Oct 9, 2023
@adejanovski
Copy link
Contributor

Thanks for updating the ticket @Mokto !
Let us know if you run into any other issue.

@mprimeaux
Copy link

mprimeaux commented Oct 11, 2023

@adejanovski I am still experiencing this issue with version 1.9.1 of the k8ssandra-operator. From the description above, it's unclear if there's a workaround or if I'm hitting another condition the fix referenced above in 1.9.1 didn't cover.

Environment

K8ssandra Operator version:
2.9.1

Kubernetes version information:
1.27.4

Kubernetes cluster kind:
EKS

I'm happy to open another issue if this is a misconfiguration on my side.

Here's the exception.

PodInitializing for sage-cassandra/cluster1-dc1-default-sts-0 (server-system-logger)
stream logs failed container "cassandra" in pod "cluster1-dc1-default-sts-0" is waiting to start: PodInitializing for sage-cassandra/cluster1-dc1-default-sts-0 (cassandra)
server-config-init Error: open /cassandra-base-config/cassandra-env.sh: no such file or directory
server-config-init Usage:
server-config-init   k8ssandra config build [flags]
server-config-init 
server-config-init Examples:
server-config-init 
server-config-init     # Process the config files from cass-operator input
server-config-init     kubectl k8ssandra config build [<args>]
server-config-init     
server-config-init 
server-config-init Flags:
server-config-init       --as string                      Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
server-config-init       --as-group stringArray           Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
server-config-init       --as-uid string                  UID to impersonate for the operation.
server-config-init       --cache-dir string               Default cache directory (default "/.kube/cache")
server-config-init       --certificate-authority string   Path to a cert file for the certificate authority
server-config-init       --client-certificate string      Path to a client certificate file for TLS
server-config-init       --client-key string              Path to a client key file for TLS
server-config-init       --cluster string                 The name of the kubeconfig cluster to use
server-config-init       --context string                 The name of the kubeconfig context to use
server-config-init       --disable-compression            If true, opt-out of response compression for all requests to the server
server-config-init   -h, --help                           help for build
server-config-init       --input string                   read config files from this directory instead of default
server-config-init       --insecure-skip-tls-verify       If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
server-config-init       --kubeconfig string              Path to the kubeconfig file to use for CLI requests.
server-config-init   -n, --namespace string               If present, the namespace scope for this CLI request
server-config-init       --output string                  write config files to this directory instead of default
server-config-init       --request-timeout string         The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests. (default "0")
server-config-init   -s, --server string                  The address and port of the Kubernetes API server
server-config-init       --tls-server-name string         Server name to use for server certificate validation. If it is not provided, the hostname used to contact the server is used
server-config-init       --token string                   Bearer token for authentication to the API server
server-config-init       --user string                    The name of the kubeconfig user to use
server-config-init 
stream logs failed container "server-system-logger" in pod "cluster1-dc1-default-sts-0" is waiting to start: 

@mprimeaux
Copy link

mprimeaux commented Oct 11, 2023

As an update, I removed the spec.cassandra.datacenters.initContainers stanza from my manifest and the cluster started up.

# Ref: https://docs-v2.k8ssandra.io/reference/crd/k8ssandra-operator-crds-latest/
# Ref: https://github.com/k8ssandra/k8ssandra-operator
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: cluster1
spec:
  cassandra:
    # Ref: https://hub.docker.com/r/k8ssandra/cass-management-api/tags
    serverVersion: "4.1.2"
    clusterName: cluster1

    # Sets whether multiple Cassandra instances can be scheduled on the same node.
    # This should normally be false to ensure cluster resilience but may be set true
    # for test/dev scenarios to minimise the number of nodes required.
    softPodAntiAffinity: true

    # Use superuserSecretName to setup superuser pre-defined credentials for the
    # database in a Kubernetes secret. Cass Operator will read the secret and pass
    # the values to the Management API when managing the cluster. If this is
    # empty, Cass Operator will generate a secret instead.
    superuserSecretRef:
      name: ""

    # Limit each pod to a fixed 2 CPU cores and 8 GB of RAM.
    resources:
      requests:
        memory: 8Gi
        cpu: 2000m
      limits:
        memory: 13Gi
        cpu: 3000m

    tolerations:
    - key: "storage"
      operator: "Equal"
      value: "cassandra"
      effect: "NoSchedule"

    datacenters:
      - metadata:
          name: dc1

        # The number of server nodes.
        size: 3

        initContainers:
          - name: server-config-init # defaults cannot be overridden ?
            resources:
              requests:
                cpu: 1000m
                memory: 1Gi
              limits:
                cpu: 1000m
                memory: 1Gi

        storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: server-storage
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi

        config:
          jvmOptions:
            heapSize: 4Gi
            gc: G1GC
            gc_g1_rset_updating_pause_time_percent: 5
            gc_g1_max_gc_pause_ms: 300
          cassandraYaml:
            authenticator: org.apache.cassandra.auth.PasswordAuthenticator
            authorizer: org.apache.cassandra.auth.CassandraAuthorizer
            role_manager: org.apache.cassandra.auth.CassandraRoleManager
            # Ref: https://github.com/apache/cassandra/blob/cassandra-4.0.0/NEWS.txt#L374-L380
            sasi_indexes_enabled: true
            materialized_views_enabled: true

I was attempting to set resource requests and limits on the server-config-init init container when I hit the exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working done Issues in the state 'done'
Projects
No open projects
Archived in project
Development

No branches or pull requests

4 participants