CNDB-12154 #1467

djatnieks · 2024-12-18T00:28:54Z

STILL DRAFT

This PR is for branch CNDB-12154 porting the latest changes from main to main-5.0.

This additional metadata is somewhat valuable in the context of troubleshooting. Recently, we had an issue where the checksum itself was not (over)written and so it was stored as 0. In many cases, this won't be helpful, but since it is cheap and could be helpful, I propose adding some additional metadata when checksums don't match.

* Implement FSError#getMessage to ensure file name is logged For this code block: ```java var t = new FSWriteError(new IOException("Test failure"), new File("my", "file")); logger.error("error", t); ``` We used to log: ``` ERROR [main] 2024-09-19 11:09:18,599 VectorTypeTest.java:118 - error org.apache.cassandra.io.FSWriteError: java.io.IOException: Test failure at org.apache.cassandra.index.sai.cql.VectorTypeTest.endToEndTest(VectorTypeTest.java:117) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55) Caused by: java.io.IOException: Test failure ... 42 common frames omitted ``` Now we will log: ``` ERROR [main] 2024-09-19 11:10:02,910 VectorTypeTest.java:118 - error org.apache.cassandra.io.FSWriteError: my/file at org.apache.cassandra.index.sai.cql.VectorTypeTest.endToEndTest(VectorTypeTest.java:117) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55) Caused by: java.io.IOException: Test failure ... 42 common frames omitted ``` * Add super.getMessage to message

* Use query view's locked indexes for Plan#estimateAnnNodesVisited This commit doesn't resolve the underlying problem in the design: we could easily use the wrong reference at any time. I'll need to think on this a bit more to know what is best. * Assert queryView is not null

This commit fixes a serious correctness bug in the way we build RowFilter for expressions involving OR. If a query contained multiple complex predicates such as NOT IN joined with the OR operator, the slices produced by NOT IN were incorrectly also joined by OR instead of by AND. In addition, a NOT IN with an empty list, if ORed with another expression, was incorrectly treated as an expression matching 0 rows, instead of matching all rows. Example 1: SELECT * FROM t WHERE x = 1 OR x NOT IN (2, 3, 4) was incorrectly matching all rows, including the ones with x = 2 or x = 3 or x = 4. Example 2: SELECT * FROM t WHERE x = 1 OR x NOT IN () was incorrectly matching only row with x = 1, instead of all rows. The bug was technically not limited to NOT IN, but any single restriction that wanted to add in exactly zero or more than one filter expression. Fixes riptano/cndb#10923

Fix typo in the comment Co-authored-by: Andrés de la Peña <[email protected]>

Javadoc Co-authored-by: Andrés de la Peña <[email protected]>

…ted (#1238)" (#1301) This commit does not work because the queryView is not set until after we get the plan. Since the view is only used to estimate the cost of work and not to skip work, it is safe to use a different view. This reverts commit 9333708.

…n overridable method… (#1296) This commit simply move the code that deletes index components locally when an index is dropped inside a new method of the `SSTableWatcher` interface. As is, this just creates a minor indirection without changing any behaviour, but this allows custom implementations of `SSTableWatcher` to modify this behaviour, typically to handle tiered storage concerns.

* Added unified_compaction.override_ucs_config_for_vector_tables option. When enabled, the Controller will use the preferred vector settings for vector tables. - Added additional options for vector specific configuration.

…hards (#1255) Implement `ShardManagerReplicaAware` to align UCS and replica shards and thus limit the amount of sstables that are partially owned by replicas. The most interesting details are in the `IsolatedTokenAllocator#allocateTokens` and the `ShardManagerReplicaAware#computeBoundaries` methods. In the `allocateTokens` method, we take the current token metadata for a cluster, replace the snitch with one that does not gossip, and allocate new nodes until we satisfy the desired `additionalSplits` needed. By using the token allocation algorithm, high level split points naturally align with replica shards as new nodes are added. In `computeBoundaries`, we allocate any tokens needed, then we split the space into even spans and find the nearest replica token boundaries.

CNDB-10988: inspect out-of-space exception on compaction

- count evictions and not bytes - add metrics by removal cause

…nts of a collection and/or UDT Port of DB-1289/CASSANDRA-8877

) CNDB-10945: Change calculation of sstable span for small sstables In addition to correcting for very small spans, this also corrects sstable spans for ones having a small number of partitions where keys can easily fall in a small range. For these cases we use a value derived from the number of partitions in the table, which should work well for all scenarios, including wide-partition tables with a very limited number of partitions.

The eagerly populated leafNodeToLeafFP TreeMap has been replaced with a LeafCursor which allows to traverse the index tree directly, in a lazy way. The change significantly reduces the amount of up-front work we did to initialize the BKDReader.IteratorState. It also reduces GC pressure and memory usage. The user-facing effect is that `ORDER BY ... LIMIT ...` queries using a numeric index (KD-tree) are significantly faster. Fixes riptano/cndb#11021

Implements Stage 2 of Trie memtables, implementing trie-backed partitions and extending the memtable map to the level of tries. This stage still handles deletions with the legacy mechanisms (RangeTombstoneList etc) but can save quite a bit of space for the B-Tree partition-to-row maps. Also includes: - Code alignment with the Cassandra 5.0 branch. - A port of the OSS50 byte-comparable encoding version. - Fixed version preencoded byte-comparables for version safety. - Duplication of byte sources and better toArray conversion. - Direct skipping mechanism for trie cursors and map-like get method. - Forced node copying mechanism for trie mutations for atomicity and consistency. - Pluggable cell reuse. - Prefix and tail tries, as well as filtered iteration. - A mechanism for extracting current key during trie mutation. - Volatile reads to fully enforce happens-before between node preparation and use. - Various in-memory trie improvements. The earlier trie memtable implementation is still available as TrieMemtableStage1.

* DefaultMemtableFactory: align entire implementation with main branch * TrieMemtable: restore FACTORY instance var and factory(Map) method * TrieMemoryIndex: add previously missed use of BYTE_COMPARABLE_VERSION in rangeMatch method

…nt NPE (#1321)

…gregate and use picked sstables size as maxOverlap for aggregate (#1309) CNDB-10990: include archive size in Level.avg and use maxOverlap for unified aggregate update getOverheadSizeInBytes to include non-data components size Add config to disable new behavior, default enabled. add tests

…aybe we still want to check newer version for JDK22 specifically. Though this is the last ecj version to support JDK11. Upgrade: - ecj plus fix the java udf functions for JDK11+ - snakeyaml - it was already bumped in CNDB for security vulnerability - test dependencies: jacoco, byteman - higher version than CNDB but it is needed for catching up on JDK22 in tests findbugs - aligned with CNDB version but we probably want at some point to get to major version upgrade; not a priority for now jmh, bytebuddy - bumped to latest versions as they are known for not working on newer JDK versions

This commit improves performance of appending SAI components by avoiding unnecessary computation of CRC from the beginning of the file each time it is opened for appending. Fixes riptano/cndb#10783

Expose some methods needed for riptano/cndb#12128

Unbounded queue length at the native transport stage can caused large backlogs of requests that the system processes, even though clients may no longer expect a response. This PR implements a limited backport of CNDB-11070, introducing the notion of a native request timeout that can shed messages with excessive queue times at the NTR stage as well as async read/write stages, if enabled. Cross-node message timeouts are also now respected earlier in the mutation verb handler. This is a fairly straightforward cherry-pick of #1393 targeting main instead of cc-main-migration-release.

This reverts commit c06c94c. It seems the removal of `Index#postProcessor` by CNDB-11762 broke some tests in CNDB's `MultiNodeBillingTest`. Unfortunately that patch [was merged before creating the PR bumping the CC version used by CNDB](#1422 (comment)). [The CNDB PR](riptano/cndb#12076) was created after that merging but it was superseded by other CC version bumps. So I'm adding this reversal so we can investigate how the removal of `Index#postProcessor` affects those tests.

This patch replaces null values of `deterministic`, `monotonic` and `monotonic_on` columns in `system_schema.functions` and `system_schema.aggregates` with negative defaults. These defaults will be addressed if/once DB-672 gets ported to CC.

There are two mechanisms of detecting that the cluster is in the upgrade state and the minimum version. Both are slightly different, and both are not pluggable which means that CNDB doesn't work properly with them. Those mechanisms are implemented in `Gossiper`. Although we do not use `Gossiper` in CNDB, there are classes like `ColumnFilter` which go to `Gossiper` to check the upgrade state. So far, using that stuff in CDNB was a bit unpredictable, some of them reported the cluster is upgraded and in the current version, the other did not. This turned out to be a problem, especially for the `ColumnFilter` because when we upgrade DSE --> CC, CC assumes that the newest filter version should be used, which is not correctly deserialized and interpreted by DSE. The fix is not small, but it probably simplifies stuff a bit. First of all, two mechanism are merged into one. Moreover, we added pluggability of it so that we can provide the appropriate implementation in CNDB coordinators and writers, which is based on ETCD.

Part of riptano/cndb#12139 Moves constant shard count outside looping shards to reduce confusion.

djatnieks · 2024-12-18T19:25:29Z

src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java

+     * @return the accumulated BTree size of the data contained in this update.
+     */
+    long accumulatedDataSize();
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:25:36Z

src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java

    /**
     * The size of the data contained in this update.
     *
     * @return the size of the data contained in this update.
     */
    int dataSize();

+    // FIXME review
+    long unsharedHeapSize();
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:26:11Z

src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java

+    public long accumulatedDataSize()
+    {
+        return dataSize;
+    }


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:26:17Z

src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java

+    public long unsharedHeapSize()
+    {
+        return dataSize;
+    }


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:26:28Z

src/java/org/apache/cassandra/db/partitions/BTreePartitionUpdate.java

+    {
+        return BTree.<Row>accumulate(holder.tree, (row, value) -> row.dataSize() + value, 0L);
+    }
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:26:33Z

src/java/org/apache/cassandra/db/partitions/BTreePartitionUpdate.java

+        return BTree.<Row>accumulate(holder.tree, (row, value) -> row.unsharedHeapSize() + value, 0L)
+                + holder.staticRow.unsharedHeapSize();
+    }
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:29:46Z

src/java/org/apache/cassandra/db/partitions/BTreePartitionUpdate.java

+        RegularAndStaticColumns columns = RegularAndStaticColumns.builder().addAll(columnSet).build();
+        return new BTreePartitionUpdate(metadata, partitionKey, holder.withColumns(columns), deletionInfo.mutableCopy(), false);
+    }
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:33:53Z

src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java

+
+        return count;
+    }
+


For Review:
affectedRowCount and affectedColumnCount were both present in the 5.0 PartitionUpdate and here made default methods when PartitionUpdate became an interface.

djatnieks · 2024-12-18T19:34:04Z

src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java

@@ -188,7 +199,18 @@ static PartitionUpdate merge(List<? extends PartitionUpdate> updates)
        return updates.get(0).metadata().partitionUpdateFactory().merge(updates);
    }

-    public static SimpleBuilder simpleBuilder(TableMetadata metadata, Object... partitionKeyValues)
+    PartitionUpdate withOnlyPresentColumns();


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:34:56Z

src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java

+        RegularAndStaticColumns columns = RegularAndStaticColumns.builder().addAll(columnSet).build();
+        return new TriePartitionUpdate(metadata, partitionKey, columns, stats, rowCountIncludingStatic, dataSize, trie, false);
+    }
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:38:02Z

test/distributed/org/apache/cassandra/distributed/test/FrozenUDTTest.java

+                                           1, 1);
+            cluster.coordinator(1).execute("insert into " + KEYSPACE + ".x (id, ck, i) VALUES (?, " + json(1, 2) + ", ? )", ConsistencyLevel.ALL,
+                                           2, 2);
+            cluster.get(2).flush(KEYSPACE);


For Review:

iirc, the changes made here did not work for main-5.0 and since CASSANDRA-19764 seems to still be in-progress, I reverted the change made by CNDB-9850 with the idea that when CASSANDRA-19764 lands in Apache it will also be reflected in main-5.0 after a subsequent rebase.

djatnieks · 2024-12-18T19:41:56Z

test/microbench/org/apache/cassandra/test/microbench/instance/SimpleTableWriter.java

+    int rowsPerPartition = 1;
+
+    int partitions;
+


For Review:

These changes in this class are coming from WriteTest made in CNDB-9850

djatnieks · 2024-12-18T19:44:56Z

test/microbench/org/apache/cassandra/test/microbench/instance/WriteBench.java

        }
    }

    public Object[] writeArguments(long i)
    {
-        return new Object[] { i, i, i };
+        return new Object[] { i % partitions, i, i };


I think this was just missed during first pass of cherry-pick from main

djatnieks · 2024-12-18T19:45:43Z

test/unit/org/apache/cassandra/db/memtable/MemtableQuickTest.java

-                                "trie");
+                                "trie",
+                                "trie_stage1",
+                                "persistent_memory");


For Review:

Trying to align with main - just want to double check.

djatnieks · 2024-12-18T19:45:52Z

test/unit/org/apache/cassandra/db/memtable/MemtableSizeTestBase.java

@@ -79,6 +79,7 @@ public static List<Object> parameters()
    {
        return ImmutableList.of("skiplist",
                                "skiplist_sharded",
+                                "trie_stage1",
                                "trie");
    }


For Review:

Trying to align with main - just want to double check.

djatnieks · 2024-12-18T19:47:27Z

test/unit/org/apache/cassandra/db/tries/TrieToDotTest.java


 public class TrieToDotTest
 {
    @Test
    public void testToDotContent() throws Exception
    {
-        InMemoryTrie<String> trie = new InMemoryTrie<>(BufferType.OFF_HEAP);
+        InMemoryTrie<Object> trie = new InMemoryTrie<>(ByteComparable.Version.OSS50, BufferType.OFF_HEAP, InMemoryTrie.ExpectedLifetime.LONG, null);


For Review:

Double check param values used here.

djatnieks · 2024-12-18T19:47:32Z

test/unit/org/apache/cassandra/db/tries/TrieToMermaidTest.java


 public class TrieToMermaidTest
 {
    @Test
    public void testToMermaidContent() throws Exception
    {
-        InMemoryTrie<String> trie = new InMemoryTrie<>(BufferType.OFF_HEAP);
+        InMemoryTrie<Object> trie = new InMemoryTrie<>(ByteComparable.Version.OSS50, BufferType.OFF_HEAP, InMemoryTrie.ExpectedLifetime.LONG, null);


For Review:

Double check param values used here.

djatnieks · 2024-12-18T19:48:24Z

test/conf/cassandra.yaml

        trie:
            class_name: TrieMemtable
            parameters:
                shards: 4
+        trie_stage1:
+            class_name: TrieMemtableStage1
        skiplist_sharded:
            class_name: ShardedSkipListMemtable
            parameters:


For Review:

Aligning defined memtable types with main for use in MemtableQuickTest and MemtableSizeTestBase

djatnieks · 2024-12-18T20:48:46Z

test/unit/org/apache/cassandra/io/sstable/format/bti/RowIndexTest.java

+        return Arrays.asList(new Object[]{ Config.DiskAccessMode.standard, latestVersion },
+                             new Object[]{ Config.DiskAccessMode.mmap, latestVersion },
+                             new Object[]{ Config.DiskAccessMode.standard, legacyVersion },
+                             new Object[]{ Config.DiskAccessMode.mmap, legacyVersion });
    }


For Review:

In 5.0 RowIndexWriter uses o.a.c.io.sstable.Version, while main is using ByteComparable.Version.

I've kept the 5.0 io.sstableVersion here.

latestVersion and legacyVersion are defined as:

private static final Version latestVersion = new BtiFormat(null).getLatestVersion(); private static final Version legacyVersion = new BtiFormat(null).getVersion("aa");

djatnieks · 2024-12-18T20:52:51Z

src/java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.java

@@ -362,7 +362,7 @@ static class BtiVersion extends Version

            isLatestVersion = version.compareTo(current_version) == 0;
            correspondingMessagingVersion = MessagingService.VERSION_50;
-            byteComparableVersion = version.compareTo("ca") >= 0 ? ByteComparable.Version.OSS50 : ByteComparable.Version.LEGACY;
+            byteComparableVersion = version.compareTo("ca") >= 0 ? ByteComparable.Version.OSS41 : ByteComparable.Version.LEGACY;
            hasOldBfFormat = aOrLater && !bOrLater;


For Review:

Most tests pass using OSS41, while many fail using OSS50, so I decided to keep the alignment with main.

djatnieks · 2024-12-18T21:09:27Z

test/unit/org/apache/cassandra/db/compaction/CompactionControllerTest.java

+                       "    org.apache.cassandra.db.compaction.CompactionControllerTest.memtableRaceFinishLatch, " +
+                       "    5, java.util.concurrent.TimeUnit.SECONDS);")
+    })
+    public void testMemtableRace() throws Exception


For Review:

CompactionControllerTest.testMemtableRace fails.

Changing MemtableParams default memtable factory from SkipListMemtableFactory to DefaultMemtableFactory (TrieMemtable) will pass testMemtableRace IF RUN BY ITSELF. If the entire CompactionControllerTest is run in IntelliJ then it still fails. Not sure what's happening.

djatnieks · 2024-12-18T22:23:44Z

test/unit/org/apache/cassandra/db/memtable/MemtableThreadedTest.java

+///
+/// A problem with this will only appear as intermittent failures, never treat this test as flaky.
+@RunWith(Parameterized.class)
+public class MemtableThreadedTest extends CQLTester


For Review:
Running locally, MemtableThreadedTest fails (hangs) on command line, but passes in Intellij.
In CI (link above), MemtableThreadedTest.testConsistentUpdates[PersistentMemoryMemtable] fails with a Timeout.

…with DurationSpec type and 'native_transport_timeout_in_ms' as convertible old name with Long type; add some tests.

ekaterinadimitrova2 · 2024-12-19T08:55:25Z

test/unit/org/apache/cassandra/index/sai/memory/AbstractTrieMemtableIndexTest.java


-@Ignore
-public abstract class AbstractTrieMemtableIndexTest extends SAITester
+public class AbstractTrieMemtableIndexTest extends TrieMemtableIndexTestBase


not abstract anymore

michaeljmarshall and others added 30 commits November 21, 2024 12:44

Update src/java/org/apache/cassandra/db/filter/RowFilter.java

8c04f9c

Fix typo in the comment Co-authored-by: Andrés de la Peña <[email protected]>

Update src/java/org/apache/cassandra/db/filter/RowFilter.java

b6f2046

Javadoc Co-authored-by: Andrés de la Peña <[email protected]>

Cleanup unused throws Throwable and other minor warnings

46d4d15

Remove unused enum variant

4eddc05

CNDB-10988: inspect no space left exception on compaction (#1297)

64a1aa4

CNDB-10988: inspect out-of-space exception on compaction

CNDB-11048 ChunkCache eviction metrics improvements (#1303)

d4b7a54

- count evictions and not bytes - add metrics by removal cause

CNDB-10966: Add ability to read the TTLs and write times of the eleme…

2586d51

…nts of a collection and/or UDT Port of DB-1289/CASSANDRA-8877

CNDB-10939: Port CASSANDRA-19461 (index empty values)

e45beb7

CNDB-10870: Upgrade JVector dependency to 3.0.1

36f06d1

CNDB-9850 followup: Additional rebase changes to review

f6c8a6a

CNDB-9850 followup: fix failing tests

86c00ae

* DefaultMemtableFactory: align entire implementation with main branch * TrieMemtable: restore FACTORY instance var and factory(Map) method * TrieMemoryIndex: add previously missed use of BYTE_COMPARABLE_VERSION in rangeMatch method

CNDB-9850 followup: Fix RowIndexTest

c1a3fb6

CNDB-9850 followup: Align ByteComparable.Version values with main branch

7707d09

CNDB-10295 Followup - Do not use lambdas in InMemoryTrie (#1326)

7831acb

CNDB-11134: Fix data race in adjustForCrossDatacenterClashes to preve…

d246168

…nt NPE (#1321)

Fix access to JDK internals for jamm post CNDB-9850

0de0951

CNDB-10783: Reuse CRC when appending index components

ad42706

This commit improves performance of appending SAI components by avoiding unnecessary computation of CRC from the beginning of the file each time it is opened for appending. Fixes riptano/cndb#10783

eolivelli and others added 6 commits December 17, 2024 17:17

CNDB-12128 Expose to CNDB some methods around ExpirationTask (#1465)

bf4ab4c

Expose some methods needed for riptano/cndb#12128

CNDB-12139 minor refactor row count estimation code (#1463)

762f657

Part of riptano/cndb#12139 Moves constant shard count outside looping shards to reduce confusion.

djatnieks commented Dec 18, 2024

View reviewed changes

CC5 Fix for CNDB-11070 to use 'native_transport_timeout' as 5.0 name …

7fdd10d

…with DurationSpec type and 'native_transport_timeout_in_ms' as convertible old name with Long type; add some tests.

ekaterinadimitrova2 reviewed Dec 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNDB-12154 #1467

CNDB-12154 #1467

djatnieks commented Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024

ekaterinadimitrova2 Dec 19, 2024

CNDB-12154 #1467

Are you sure you want to change the base?

CNDB-12154 #1467

Conversation

djatnieks commented Dec 18, 2024 • edited Loading

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djatnieks commented Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading