The RunParameters
class contains a set of parameters that configure the query and run details for the Correlation Detective algorithm.
These parameters allow you to customize the behavior of the algorithm to suit your specific needs.
We refer to our paper for more details about the parameters and their effects on the algorithm.
Below is an overview of the parameters in the RunParameters
class:
Name | Domain | Default Value | Accessibility | Description |
---|---|---|---|---|
inputPath | String (File path) | N/A | Read and Write | Path to the input dataset. |
simMetricName | PEARSON_CORRELATION, SPEARMAN_CORRELATION, MULTIPOLE, EUCLIDEAN_SIMILARITY, MANHATTAN_SIMILARITY, TOTAL_CORRELATION |
N/A | Read and Write | Similarity metric to use. |
maxPLeft | Integer (Between 1 and 10) | N/A | Read and Write | Maximum set size for the left side of the correlation pattern. |
maxPRight | Integer (Between 0 and 10) | N/A | Read and Write | Maximum set size for the right side of the correlation pattern. |
logLevel | Level (Enumeration) | INFO | Read and Write | Logging level. |
dateTime | String | Current date | Read-only | Date and time string. |
monitorStats | boolean | true | Read and Write | Flag to enable monitoring OF statistics. |
threads | int (Between 1 and 80) | Min(80, CPU cores * 4) | Read-only | Number of threads to use. |
parallel | boolean | true | Read and Write | Flag to enable parallel execution. |
random | boolean | true | Read and Write | Flag to enable randomized execution (non-seeded). |
seed | int | 0 | Read and Write | Random seed value. |
queryType | TOPK, THRESHOLD | TOPK | Read and Write | Type of query to run. |
tau | double | inferred from simMetric | Read and Write | Correlation Threshold value (only if queryType == THRESHOLD). |
runningThreshold | RunningThreshold (Enumeration) | tau | Read and Write | Running correlation threshold (only used if queryType == TOPK). |
minJump | double (Between 0 and Double.MAX_VALUE) | 0 | Read and Write | Minimum jump value. |
irreducibility | boolean | false | Read and Write | Flag to enable irreducibility constraint. |
topK | int (Between 0 and 100000) | 100 | Read and Write | The maximum number of top results to retrieve. |
allowVectorOverlap | boolean | false | Read and Write | Flag to allow vector overlap in the correlation pattern. |
nVectors | int (Between 1 and Integer.MAX_VALUE) | all | Read and Write | Number of vectors to read from the dataset. |
nDimensions | int (Between 1 and Integer.MAX_VALUE) | all | Read and Write | Number of dimensions to read per vector. |
partition | int (Between 0 and Integer.MAX_VALUE) | 0 | Read and Write | Dataset partition identifier. |
dimensionalityReduction | boolean | false | Read and Write | Flag to enable dimensionality reduction. |
dimredEpsilon | double (Between 0 and 1) | 0.1 | Read and Write | Epsilon value for dimensionality reduction. |
dimredDelta | double (Between 0 and 1) | 0.8 | Read and Write | Delta value for dimensionality reduction. |
dimredCorrect | boolean | true | Read and Write | Flag to enable dimensionality reduction correction. |
dimredComponents | Integer (Between 1 and Integer.MAX_VALUE) | 0.1 * nDimensions | Read and Write | Number of dimensionality reduction components. |
discounting | boolean | false | Read and Write | Flag to enable bound discounting. |
discountThreshold | double (Between 0 and 2) | 0.7 | Read and Write | Discount threshold value. |
discountTopK | int (Between 1 and Integer.MAX_VALUE) | 10 | Read and Write | Number of extrema distances to store for each CC. |
discountStep | int (Between 1 and Integer.MAX_VALUE) | 1 | Read and Write | Discounting step value. |
empiricalBounding | boolean | true | Read and Write | Flag to enable empirical bounding (only if simMetric supports). |
kMeans | Integer (Between 1 and Integer.MAX_VALUE) | inferred from simMetric | Read and Write | K-means parameter for Hierarchical Clustering algorithm . |
geoCentroid | boolean | false | Read and Write | Flag to enable usage of geometric centroid in clusters. |
startEpsilon | double (Between 0 and Double.MAX_VALUE) | inferred from simMetric | Read and Write | Starting epsilon value for clustering. |
epsilonMultiplier | double (Between 0 and 1) | 0.8 | Read and Write | Epsilon multiplier for clustering. |
maxLevels | int (Between 1 and Integer.MAX_VALUE) | 20 | Read and Write | Maximum levels in the cluster hierarchy. |
clusteringAlgorithm | KMEANS | KMEANS | Read and Write | Clustering algorithm to use. |
breakFirstKLevelsToMoreClusters | int (Between 0 and Integer.MAX_VALUE) | 0 | Read and Write | Number of levels to break into more clusters. |
clusteringRetries | int (Between 1 and Integer.MAX_VALUE) | 20 | Read and Write | Number of clustering tries per cluster level. |
hashSize | int (Between 1 and Integer.MAX_VALUE) | inferred from query | Read and Write | Hash size for caches (centroids and cluster combinations). |
BFSRatio | double (Between 0 and 1) | 0.5 | Read and Write | BFS ratio for traversal of the comparison tree. |
BFSFactor | double | inferred from BFSRatio | Read and Write | BFS factor for traversal of the comparison tree (based on ratio). |
shrinkFactor | double | 0 | Read and Write | Shrink factor |
statBag | StatBag (Object) | constructed after init() | Read-only | Statistics bag. |
randomGenerator | Random (Object) | constructed after init() | Read-only | Random number generator. |
pairwiseDistances | double[][] (2D Array) | constructed after init() | Read-only | Pairwise distances cache. |
To use these parameters and run a query with the Correlation Detective algorithm, refer to the README.md file for usage instructions and examples.