WIP feat(cass): add shardKey to partKey table #1361

alextheimer · 2022-04-06T00:12:21Z

Adds a Cassandra table that maps shard keys to partition keys.

    s"""CREATE TABLE IF NOT EXISTS $tableString (
       |    shardKey blob,
       |    partKey blob,
       |    PRIMARY KEY (shardKey, partKey)
       |    ...

Now, all time-series partition keys for a given shard key can be efficiently scanned.

alextheimer · 2022-04-09T00:23:12Z

core/src/test/scala/filodb.core/TestData.scala

+  def records(ds: Dataset, readerSeq: Seq[RowReader]): SomeData = {
+    val builder = new RecordBuilder(MemFactory.onHeapFactory)
+    readerSeq.foreach { row => builder.addFromReader(row, ds.schema) }
+    builder.allContainers.zipWithIndex.map { case (container, i) => SomeData(container, i) }.head
+  }
+
+  def partKeyFromRecords(ds: Dataset, records: SomeData, builder: Option[RecordBuilder] = None): Seq[Long] = {
+    val partKeyBuilder = builder.getOrElse(new RecordBuilder(TestData.nativeMem))
+    records.records.map { case (base, offset) =>
+      ds.comparator.buildPartKeyFromIngest(base, offset, partKeyBuilder)
+    }.toVector
+  }


(Almost) mirror the same methods in GdeltTestData.

alextheimer · 2022-04-09T00:25:26Z

cassandra/src/main/scala/filodb.cassandra/columnstore/CassandraColumnStore.scala

      val partKeyTablesInit = Observable.fromIterable(0.until(numShards)).map { s =>
        getOrCreatePartitionKeysTable(dataset, s)
      }.mapEval(t => Task.fromFuture(t.initialize())).toListL
+      val shardKeyToPartKeyTableInit = Observable.fromIterable(0.until(numShards)).map { s =>
+        getOrCreateShardKeyToPartKeyTable(dataset, s)
+      }.mapEval(t => Task.fromFuture(t.initialize())).toListL


Every PartKeyTable init/insert/delete has a counterpart ShardKeyToPartKeyTable call.

alextheimer · 2022-04-09T00:28:41Z

cassandra/src/main/scala/filodb.cassandra/columnstore/CassandraColumnStore.scala

+  val shardKeyToPartKeyTableCache = concurrentCache[DatasetRef,
+                                      ConcurrentLinkedHashMap[Int, ShardKeyToPartKeyTable]](tableCacheSize)


Should there be a table per shard (like PartitionKeysTable)?

alextheimer · 2022-04-11T17:09:24Z

cassandra/src/main/scala/filodb.cassandra/columnstore/CassandraColumnStore.scala

+  def scanPartKeysByShardKey(ref: DatasetRef, shard: Int, shardKey: Array[Byte]): Observable[Array[Byte]] = {
+    val table = getOrCreateShardKeyToPartKeyTable(ref, shard)
+    table.scanPartKeys(shardKey)
+    // TODO(a_theimer): figure out if token ranges apply here


These will only apply if we add extra values to the PK to prevent hot-spotting, right?

alextheimer · 2022-04-11T17:10:54Z

cassandra/src/main/scala/filodb.cassandra/columnstore/ShardKeyToPartKeyTable.scala

+
+  val suffix = s"shardKeyToPartKey_$shard"
+
+  // TODO(a_theimer): compression settings okay?


Left the same as PartitionKeyTable

alextheimer · 2022-04-11T17:13:27Z

cassandra/src/main/scala/filodb.cassandra/columnstore/ShardKeyToPartKeyTable.scala

+  // TODO(a_theimer): probably need to prevent hot-spotting on a single node
+  //   (i.e. add more fields to [Cassandra's] partition key)


I won't do this until benchmark tests indicate we need it (or one of you already knows we'll need it).

alextheimer force-pushed the ns-mig branch 2 times, most recently from 5aa7797 to 69ab230 Compare April 8, 2022 06:59

alextheimer commented Apr 9, 2022

View reviewed changes

alextheimer added 9 commits April 8, 2022 18:02

add shardKey -> partKey table

f947fd0

add MetricsTestData methods

7406f34

add test; update existing test

f6ba144

scalastyle fixes

4d8bc58

add missing schema args

30d286c

fix getOrCreateShardKeyToPartKeyTable bug

0c18792

cleanup

bdd9dfa

make shardKeyFromPartKey Schemas method

7f4393f

cleanup

6604f5d

alextheimer force-pushed the ns-mig branch from 18ab48f to 6604f5d Compare April 9, 2022 01:06

alextheimer commented Apr 11, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP feat(cass): add shardKey to partKey table #1361

WIP feat(cass): add shardKey to partKey table #1361

alextheimer commented Apr 6, 2022 •

edited

Loading

alextheimer Apr 9, 2022

alextheimer Apr 9, 2022

alextheimer Apr 9, 2022

alextheimer Apr 11, 2022

alextheimer Apr 11, 2022

alextheimer Apr 11, 2022

		val shardKeyToPartKeyTableCache = concurrentCache[DatasetRef,
		ConcurrentLinkedHashMap[Int, ShardKeyToPartKeyTable]](tableCacheSize)


		val suffix = s"shardKeyToPartKey_$shard"

		// TODO(a_theimer): compression settings okay?

		// TODO(a_theimer): probably need to prevent hot-spotting on a single node
		// (i.e. add more fields to [Cassandra's] partition key)

**WIP** feat(cass): add shardKey to partKey table #1361

Are you sure you want to change the base?

**WIP** feat(cass): add shardKey to partKey table #1361

Conversation

alextheimer commented Apr 6, 2022 • edited Loading

alextheimer Apr 9, 2022

Choose a reason for hiding this comment

alextheimer Apr 9, 2022

Choose a reason for hiding this comment

alextheimer Apr 9, 2022

Choose a reason for hiding this comment

alextheimer Apr 11, 2022

Choose a reason for hiding this comment

alextheimer Apr 11, 2022

Choose a reason for hiding this comment

alextheimer Apr 11, 2022

Choose a reason for hiding this comment

WIP feat(cass): add shardKey to partKey table #1361

WIP feat(cass): add shardKey to partKey table #1361

alextheimer commented Apr 6, 2022 •

edited

Loading