Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge integration to Main #1676

Merged
merged 47 commits into from
Sep 27, 2023
Merged
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
7679af3
Revert "maint(core): upgrade lucene to 9.7.0 (#1617)" (#1622)
alextheimer Jul 11, 2023
80245ce
feat(core): Simplify configuration to scale Filodb horizontally (#1610)
vishramachandran Jul 12, 2023
90303aa
filodb(core) add debugging info for empty histogram. (#1613)
yu-shipit Jul 13, 2023
5b05779
fix(core) make the error message more frendly to users. (#1593)
yu-shipit Jul 13, 2023
a93666d
Revert "filodb(core) add debugging info for empty histogram. (#1613)"…
yu-shipit Jul 13, 2023
a37bf5f
Adding logging statement when warning is produced. (#1625)
kvpetrov Jul 14, 2023
8ecf630
perf(query): Remove boxed Double allocations from NaN checks during d…
vishramachandran Jul 17, 2023
f14a13c
fix(core) make the error message more frendly to users. (#1593) (#1630)
yu-shipit Jul 17, 2023
6ac0255
fix nullpointer happened in cardinality busting job. (#1631)
yu-shipit Jul 18, 2023
b9ea680
filodb(core) add debugging info for empty histogram. (#1624)
yu-shipit Jul 18, 2023
eebd5f4
fix(query): prevent list.head on empty list (#1632)
alextheimer Jul 20, 2023
6c1693a
maint(kafka): update consumer client id (#1633)
alextheimer Jul 24, 2023
5dadfb9
fix(query): prevent list.head on empty list (#1632)
alextheimer Jul 20, 2023
8929fb2
Merge pull request #1634 from alextheimer/cherry-pick
amolnayak311 Jul 24, 2023
59cae2a
fix(core): Consolidate num-nodes duplicate config (#1635)
vishramachandran Jul 24, 2023
ea1644b
Fix memory alloc config (#1638)
vishramachandran Jul 24, 2023
955814e
fix(query) Regex equals .* must ignore the label and match series eve…
amolnayak311 Jul 28, 2023
f5018ae
fix(core) fix the binary join aggregation across different partitions…
yu-shipit Jul 31, 2023
84a185f
feat(query): Cardinality V2 API Query Plan changes (#1637)
sandeep6189 Aug 2, 2023
c7e26a9
fix(query) Fix regression with regex match (#1640)
amolnayak311 Aug 4, 2023
dd59325
fix(query) support unary operators(+/-) (#1642)
yu-shipit Aug 4, 2023
d84c6c8
fix(core): Bug in calculating size of SerializedRangeVector (#1643)
vishramachandran Aug 9, 2023
38c682c
perf(core): ~Two times throughput improvement for Lucene queries with…
vishramachandran Aug 9, 2023
f3352e2
cherry-pick histogram debugging message and cardinality job nullptr f…
yu-shipit Aug 15, 2023
bfbeb36
filodb(core) add debugging info for empty histogram. (#1613) (#1649)
yu-shipit Aug 15, 2023
523999c
feat(query): Cardinality V2 API Query Plan changes (#1637)
sandeep6189 Aug 2, 2023
9ed8685
Merge pull request #1650 from amolnayak311/integration
amolnayak311 Aug 16, 2023
594ffce
fix(query): Adding user datasets for Cardinality V2 RemoteMetadataExe…
sandeep6189 Aug 17, 2023
89bd678
Fix MultiPartition Card Queries (#1652)
sandeep6189 Aug 18, 2023
bdcfde7
fix(query): Adding user datasets for Cardinality V2 RemoteMetadataExe…
sandeep6189 Aug 17, 2023
17fb077
Fix MultiPartition Card Queries (#1652)
sandeep6189 Aug 18, 2023
225617c
Merge pull request #1654 from sandeep6189/integration
amolnayak311 Aug 18, 2023
89095e2
feat(core): Add Query CPU Time for Index Lookups (#1655)
vishramachandran Aug 21, 2023
1e56ef9
fix(metering): Overriding the cluster name .passed to SingleClusterPl…
sandeep6189 Aug 23, 2023
c77a95a
fix(metering): Overriding the cluster name .passed to SingleClusterPl…
sandeep6189 Aug 23, 2023
710c3d2
misc(core): add downsample support for aggregated data (#1661)
sherali42 Aug 25, 2023
df3922d
misc(core): add downsample support for aggregated data (#1661)
sherali42 Aug 25, 2023
14115cf
cherry-pic misc(core): add downsample support for aggregated data (#1…
sherali42 Aug 30, 2023
6cb5433
maint(core): upgrade to Lucene 9.7.0 (#1662)
alextheimer Aug 30, 2023
7adc382
bug(query): Streaming query execution allocated too much mem via RB (…
vishramachandran Sep 1, 2023
bf8ead0
perf(card): Adding config support for DS card flushCount and perf log…
sandeep6189 Sep 11, 2023
50d28ea
fix(core) fix the binary join aggregation across different partitions…
yu-shipit Sep 13, 2023
f7d60ac
Merge branch 'develop' into integration
Sep 19, 2023
ba0023f
Merge pull request #1673 from yu-shipit/integration
yu-shipit Sep 19, 2023
78a33f6
Bump filodb version to 0.9.23.
Sep 19, 2023
b5a2e40
Merge pull request #1674 from yu-shipit/integration
yu-shipit Sep 19, 2023
4e97f53
Merge branch 'integration'
Sep 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
maint(core): upgrade to Lucene 9.7.0 (#1662)
alextheimer authored Aug 30, 2023
commit 6cb5433b202239fd1ac4baaf201468532444e8a2
50 changes: 31 additions & 19 deletions core/src/main/scala/filodb.core/memstore/PartKeyLuceneIndex.scala
Original file line number Diff line number Diff line change
@@ -24,6 +24,7 @@ import org.apache.lucene.analysis.standard.StandardAnalyzer
import org.apache.lucene.document._
import org.apache.lucene.document.Field.Store
import org.apache.lucene.facet.{FacetsCollector, FacetsConfig}
import org.apache.lucene.facet.FacetsConfig.DrillDownTermsIndexing
import org.apache.lucene.facet.sortedset.{SortedSetDocValuesFacetCounts, SortedSetDocValuesFacetField}
import org.apache.lucene.facet.sortedset.DefaultSortedSetDocValuesReaderState
import org.apache.lucene.index._
@@ -46,15 +47,19 @@ import filodb.memory.{BinaryRegionLarge, UTF8StringMedium, UTF8StringShort}
import filodb.memory.format.{UnsafeUtils, ZeroCopyUTF8String => UTF8Str}

object PartKeyLuceneIndex {
final val PART_ID = "__partId__"
// NOTE: these partId fields need to be separate because Lucene 9.7.0 enforces consistent types for document
// field values (i.e. a field cannot have both numeric and string values). Additional details can be found
// here: https://github.com/apache/lucene/pull/11
final val PART_ID_DV = "__partIdDv__"
final val PART_ID_FIELD = "__partIdField__"
final val START_TIME = "__startTime__"
final val END_TIME = "__endTime__"
final val PART_KEY = "__partKey__"
final val LABEL_LIST = s"__labelList__"
final val FACET_FIELD_PREFIX = "$facet_"
final val LABEL_LIST_FACET = FACET_FIELD_PREFIX + LABEL_LIST

final val ignoreIndexNames = HashSet(START_TIME, PART_KEY, END_TIME, PART_ID)
final val ignoreIndexNames = HashSet(START_TIME, PART_KEY, END_TIME, PART_ID_FIELD, PART_ID_DV)

val MAX_STR_INTERN_ENTRIES = 10000
val MAX_TERMS_TO_ITERATE = 10000
@@ -279,8 +284,8 @@ class PartKeyLuceneIndex(ref: DatasetRef,
var facetsConfig: FacetsConfig = _

val document = new Document()
private[memstore] val partIdField = new StringField(PART_ID, "0", Store.NO)
private val partIdDv = new NumericDocValuesField(PART_ID, 0)
private[memstore] val partIdField = new StringField(PART_ID_FIELD, "0", Store.NO)
private val partIdDv = new NumericDocValuesField(PART_ID_DV, 0)
private val partKeyDv = new BinaryDocValuesField(PART_KEY, new BytesRef())
private val startTimeField = new LongPoint(START_TIME, 0L)
private val startTimeDv = new NumericDocValuesField(START_TIME, 0L)
@@ -303,7 +308,7 @@ class PartKeyLuceneIndex(ref: DatasetRef,
if (name.nonEmpty && value.nonEmpty &&
(always || facetEnabledForLabel(name)) &&
value.length < FACET_FIELD_MAX_LEN) {
facetsConfig.setRequireDimensionDrillDown(name, false)
facetsConfig.setDrillDownTermsIndexing(name, DrillDownTermsIndexing.NONE)
facetsConfig.setIndexFieldName(name, FACET_FIELD_PREFIX + name)
document.add(new SortedSetDocValuesFacetField(name, value))
}
@@ -328,6 +333,13 @@ class PartKeyLuceneIndex(ref: DatasetRef,
partIdDv.setLongValue(partId)
document.add(partIdDv)
}

/*
* As of this writing, this documentId will be set as one of two values:
* - In TimeSeriesShard: the string representation of a partId (e.g. "42")
* - In DownsampledTimeSeriesShard: the base64-encoded sha256 of the document ID. This is used to support
* persistence of the downsample index; ephemeral partIds cannot be used.
*/
partIdField.setStringValue(documentId)

startTimeField.setLongValue(startTime)
@@ -468,7 +480,7 @@ class PartKeyLuceneIndex(ref: DatasetRef,
cforRange { 0 until partIds.length } { i =>
terms.add(new BytesRef(partIds(i).toString.getBytes(StandardCharsets.UTF_8)))
}
indexWriter.deleteDocuments(new TermInSetQuery(PART_ID, terms))
indexWriter.deleteDocuments(new TermInSetQuery(PART_ID_FIELD, terms))
}
}

@@ -498,14 +510,14 @@ class PartKeyLuceneIndex(ref: DatasetRef,
.maximumSize(100)
.recordStats()
.build((key: (IndexReader, String)) => {
new DefaultSortedSetDocValuesReaderState(key._1, FACET_FIELD_PREFIX + key._2)
new DefaultSortedSetDocValuesReaderState(key._1, FACET_FIELD_PREFIX + key._2, new FacetsConfig())
})
private val readerStateCacheNonShardKeys: LoadingCache[(IndexReader, String), DefaultSortedSetDocValuesReaderState] =
Caffeine.newBuilder()
.maximumSize(200)
.recordStats()
.build((key: (IndexReader, String)) => {
new DefaultSortedSetDocValuesReaderState(key._1, FACET_FIELD_PREFIX + key._2)
new DefaultSortedSetDocValuesReaderState(key._1, FACET_FIELD_PREFIX + key._2, new FacetsConfig())
})

def labelNamesEfficient(colFilters: Seq[ColumnFilter], startTime: Long, endTime: Long): Seq[String] = {
@@ -646,7 +658,7 @@ class PartKeyLuceneIndex(ref: DatasetRef,
s"with startTime=$startTime endTime=$endTime into dataset=$ref shard=$shardNum")
val doc = luceneDocument.get()
val docToAdd = doc.facetsConfig.build(doc.document)
val term = new Term(PART_ID, documentId)
val term = new Term(PART_ID_FIELD, documentId)
indexWriter.updateDocument(term, docToAdd)
}

@@ -711,7 +723,7 @@ class PartKeyLuceneIndex(ref: DatasetRef,
*/
def partKeyFromPartId(partId: Int): Option[BytesRef] = {
val collector = new SinglePartKeyCollector()
withNewSearcher(s => s.search(new TermQuery(new Term(PART_ID, partId.toString)), collector) )
withNewSearcher(s => s.search(new TermQuery(new Term(PART_ID_FIELD, partId.toString)), collector) )
Option(collector.singleResult)
}

@@ -720,7 +732,7 @@ class PartKeyLuceneIndex(ref: DatasetRef,
*/
def startTimeFromPartId(partId: Int): Long = {
val collector = new NumericDocValueCollector(PartKeyLuceneIndex.START_TIME)
withNewSearcher(s => s.search(new TermQuery(new Term(PART_ID, partId.toString)), collector))
withNewSearcher(s => s.search(new TermQuery(new Term(PART_ID_FIELD, partId.toString)), collector))
collector.singleResult
}

@@ -738,7 +750,7 @@ class PartKeyLuceneIndex(ref: DatasetRef,
}
// dont use BooleanQuery which will hit the 1024 term limit. Instead use TermInSetQuery which is
// more efficient within Lucene
withNewSearcher(s => s.search(new TermInSetQuery(PART_ID, terms), collector))
withNewSearcher(s => s.search(new TermInSetQuery(PART_ID_FIELD, terms), collector))
span.tag(s"num-partitions-to-page", terms.size())
val latency = System.nanoTime - startExecute
span.mark(s"index-startTimes-for-odp-lookup-latency=${latency}ns")
@@ -753,7 +765,7 @@ class PartKeyLuceneIndex(ref: DatasetRef,
*/
def endTimeFromPartId(partId: Int): Long = {
val collector = new NumericDocValueCollector(PartKeyLuceneIndex.END_TIME)
withNewSearcher(s => s.search(new TermQuery(new Term(PART_ID, partId.toString)), collector))
withNewSearcher(s => s.search(new TermQuery(new Term(PART_ID_FIELD, partId.toString)), collector))
collector.singleResult
}

@@ -804,7 +816,7 @@ class PartKeyLuceneIndex(ref: DatasetRef,
logger.debug(s"Updating document ${partKeyString(documentId, partKeyOnHeapBytes, partKeyBytesRefOffset)} " +
s"with startTime=$startTime endTime=$endTime into dataset=$ref shard=$shardNum")
val docToAdd = doc.facetsConfig.build(doc.document)
indexWriter.updateDocument(new Term(PART_ID, partId.toString), docToAdd)
indexWriter.updateDocument(new Term(PART_ID_FIELD, partId.toString), docToAdd)
}

/**
@@ -1088,7 +1100,7 @@ class SinglePartIdCollector extends SimpleCollector {

// gets called for each segment
override def doSetNextReader(context: LeafReaderContext): Unit = {
partIdDv = context.reader().getNumericDocValues(PartKeyLuceneIndex.PART_ID)
partIdDv = context.reader().getNumericDocValues(PartKeyLuceneIndex.PART_ID_DV)
}

// gets called for each matching document in current segment
@@ -1126,7 +1138,7 @@ class TopKPartIdsCollector(limit: Int) extends Collector with StrictLogging {
def getLeafCollector(context: LeafReaderContext): LeafCollector = {
logger.trace("New segment inspected:" + context.id)
endTimeDv = DocValues.getNumeric(context.reader, END_TIME)
partIdDv = DocValues.getNumeric(context.reader, PART_ID)
partIdDv = DocValues.getNumeric(context.reader, PART_ID_DV)

new LeafCollector() {
override def setScorer(scorer: Scorable): Unit = {}
@@ -1175,7 +1187,7 @@ class PartIdCollector extends SimpleCollector {

override def doSetNextReader(context: LeafReaderContext): Unit = {
//set the subarray of the numeric values for all documents in the context
partIdDv = context.reader().getNumericDocValues(PartKeyLuceneIndex.PART_ID)
partIdDv = context.reader().getNumericDocValues(PartKeyLuceneIndex.PART_ID_DV)
}

override def collect(doc: Int): Unit = {
@@ -1196,7 +1208,7 @@ class PartIdStartTimeCollector extends SimpleCollector {

override def doSetNextReader(context: LeafReaderContext): Unit = {
//set the subarray of the numeric values for all documents in the context
partIdDv = context.reader().getNumericDocValues(PartKeyLuceneIndex.PART_ID)
partIdDv = context.reader().getNumericDocValues(PartKeyLuceneIndex.PART_ID_DV)
startTimeDv = context.reader().getNumericDocValues(PartKeyLuceneIndex.START_TIME)
}

@@ -1245,7 +1257,7 @@ class ActionCollector(action: (Int, BytesRef) => Unit) extends SimpleCollector {
override def scoreMode(): ScoreMode = ScoreMode.COMPLETE_NO_SCORES

override def doSetNextReader(context: LeafReaderContext): Unit = {
partIdDv = context.reader().getNumericDocValues(PartKeyLuceneIndex.PART_ID)
partIdDv = context.reader().getNumericDocValues(PartKeyLuceneIndex.PART_ID_DV)
partKeyDv = context.reader().getBinaryDocValues(PartKeyLuceneIndex.PART_KEY)
}

4 changes: 2 additions & 2 deletions project/Dependencies.scala
Original file line number Diff line number Diff line change
@@ -73,8 +73,8 @@ object Dependencies {
"com.googlecode.javaewah" % "JavaEWAH" % "1.1.6" withJavadoc(),
"com.github.rholder.fauxflake" % "fauxflake-core" % "1.1.0",
"org.scalactic" %% "scalactic" % "3.2.0" withJavadoc(),
"org.apache.lucene" % "lucene-core" % "8.8.2" withJavadoc(),
"org.apache.lucene" % "lucene-facet" % "8.8.2" withJavadoc(),
"org.apache.lucene" % "lucene-core" % "9.7.0" withJavadoc(),
"org.apache.lucene" % "lucene-facet" % "9.7.0" withJavadoc(),
"com.github.alexandrnikitin" %% "bloom-filter" % "0.11.0",
"org.rocksdb" % "rocksdbjni" % "6.29.5",
"com.esotericsoftware" % "kryo" % "4.0.0" excludeAll(excludeMinlog),