Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest cass-operator does not support server version of Cassandra 5.0.2 #725

Open
kos-team opened this issue Nov 7, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@kos-team
Copy link

kos-team commented Nov 7, 2024

What happened?

The latest cass-operator with version 1.22.4 cannot deploy Cassandra with version 5.0.2 correctly.
From the https://github.com/k8ssandra/management-api-for-apache-cassandra repo, 5.0.2 is supported.
The Cassandra process crashes with error message: ERROR [COMMIT-LOG-ALLOCATOR] 2024-11-07 21:35:29,362 JVMStabilityInspector.java:201 - Exiting due to error while processing commit log during initialization.

What did you expect to happen?

cass-operator should be able to deploy Cassandra with 5.0.2.

How can we reproduce it (as minimally and precisely as possible)?

This bug can be reproduced by first deploying the
cass-operator.

Deploy this CR with the serverVersion set to 5.0.2:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: test-cluster
spec:
  clusterName: development
  config:
    cassandra-yaml:
      authenticator: PasswordAuthenticator
      authorizer: CassandraAuthorizer
      num_tokens: 16
      role_manager: CassandraRoleManager
      transfer_hints_on_decommission: false
  managementApiAuth:
    insecure: {}
  racks:
  - name: rack1
  - name: rack2
  - name: rack3
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi
  serverType: cassandra
  serverVersion: 5.0.2
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: standard

cass-operator version

1.22.4

Kubernetes version

1.29.1

Method of installation

Helm

Anything else we need to know?

Error log from the server-system-logger container, which is the log from the Cassandra itself

ERROR [COMMIT-LOG-ALLOCATOR] 2024-11-07 18:16:46,295 JVMStabilityInspector.java:201 - Exiting due to error while processing commit log during initialization.                     
org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: /opt/cassandra/data/commitlog/CommitLog-8-1731003406282.log: Invalid argument                            
    at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:165)                                                                                       
    at org.apache.cassandra.db.commitlog.DirectIOSegment.<init>(DirectIOSegment.java:57)                                                                                          
    at org.apache.cassandra.db.commitlog.DirectIOSegment$DirectIOSegmentBuilder.build(DirectIOSegment.java:179)                                                                   
    at org.apache.cassandra.db.commitlog.DirectIOSegment$DirectIOSegmentBuilder.build(DirectIOSegment.java:160)                                                                   
    at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.createSegment(AbstractCommitLogSegmentManager.java:277)                                                  
    at org.apache.cassandra.db.commitlog.CommitLogSegmentManagerStandard.createSegment(CommitLogSegmentManagerStandard.java:65)                                                   
    at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager$AllocatorRunnable.run(AbstractCommitLogSegmentManager.java:189)                                          
    at org.apache.cassandra.concurrent.InfiniteLoopExecutor.loop(InfiniteLoopExecutor.java:121)                                                                                   
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)                                                                                      
    at java.base/java.lang.Thread.run(Thread.java:829)                                                                                                                            
Caused by: java.nio.file.FileSystemException: /opt/cassandra/data/commitlog/CommitLog-8-1731003406282.log: Invalid argument                                                       
    at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)                                                                                          
    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)                                                                                            
    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)                                                                                            
    at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)                                                                                
    at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)                                                                                                         
    at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)                                                                                                         
    at org.apache.cassandra.db.commitlog.DirectIOSegment$DirectIOSegmentBuilder.lambda$build$0(DirectIOSegment.java:180)                                                          
    at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:161)                                                                                       
    ... 9 common frames omitted 

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: CASS-77

@kos-team kos-team added the bug Something isn't working label Nov 7, 2024
@burmanm
Copy link
Contributor

burmanm commented Nov 10, 2024

Those errors look like something is wrong in your Kubernetes environment. "Invalid argument" comes when the filesystem is unable to do something (in this case, allocate a segment in the disk). This isn't directly related to cass-operator or management-api as these functions are dependant on your StorageClass / CSI driver / Kubernetes / Linux / filesystem / etc.

Perhaps something as simple as running out of diskspace or defective disk?

I tested 5.0.2 on multiple systems and they all worked fine.

@kos-team
Copy link
Author

After some debugging, we found out the key root cause is the file system that we are running upon.
We reproduced it on a Kind Kubernetes cluster with the default local-storage CSI driver.
The host OS is a Linux system, but we were running everything on a tmpfs filesystem.
When we switched the Kind to use normal ext4 file system, 5.0.2 works fine.

We are curious what has been changed in Cassandra 5.0.2 that made it incompatible with the tmpfs file system.

@burmanm
Copy link
Contributor

burmanm commented Nov 21, 2024

I do not know, but I can make a guess. In 5.0, they introduced the DIRECT_IO as the type for Commitlog instead of mmap as the default if DirectIO is available for that target disk.

https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/config/DatabaseDescriptor.java#L1485

I don't think the logic works correctly for tmpfs in this case as it only checks for the blockSize available by creating a stub file. tmpfs probably returns a value that's in the accepted range (> 0), but tmpfs itself does not support DIRECT_IO so the real writes would fail when using that method.

Because as far as I understand tmpfs, it's already in the page cache and DIRECT_IO means bypassing the page cache. So in that sense, I wonder where it would end up.

You might get tmpfs working if you manually set the commitlog diskaccess mode to mmap or standard (with caveats of course to perf).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: No status
Development

No branches or pull requests

2 participants