Skip to content

Latest commit

 

History

History
63 lines (56 loc) · 14.3 KB

PARAMETERS.md

File metadata and controls

63 lines (56 loc) · 14.3 KB

RunParameters - Query/Run Configuration

The RunParameters class contains a set of parameters that configure the query and run details for the Correlation Detective algorithm. These parameters allow you to customize the behavior of the algorithm to suit your specific needs. We refer to our paper for more details about the parameters and their effects on the algorithm.

Parameters Overview

Below is an overview of the parameters in the RunParameters class:

Name Domain Default Value Accessibility Description
inputPath String (File path) N/A Read and Write Path to the input dataset.
simMetricName PEARSON_CORRELATION,
SPEARMAN_CORRELATION,
MULTIPOLE,
EUCLIDEAN_SIMILARITY,
MANHATTAN_SIMILARITY,
TOTAL_CORRELATION
N/A Read and Write Similarity metric to use.
maxPLeft Integer (Between 1 and 10) N/A Read and Write Maximum set size for the left side of the correlation pattern.
maxPRight Integer (Between 0 and 10) N/A Read and Write Maximum set size for the right side of the correlation pattern.
logLevel Level (Enumeration) INFO Read and Write Logging level.
dateTime String Current date Read-only Date and time string.
monitorStats boolean true Read and Write Flag to enable monitoring OF statistics.
threads int (Between 1 and 80) Min(80, CPU cores * 4) Read-only Number of threads to use.
parallel boolean true Read and Write Flag to enable parallel execution.
random boolean true Read and Write Flag to enable randomized execution (non-seeded).
seed int 0 Read and Write Random seed value.
queryType TOPK, THRESHOLD TOPK Read and Write Type of query to run.
tau double inferred from simMetric Read and Write Correlation Threshold value (only if queryType == THRESHOLD).
runningThreshold RunningThreshold (Enumeration) tau Read and Write Running correlation threshold (only used if queryType == TOPK).
minJump double (Between 0 and Double.MAX_VALUE) 0 Read and Write Minimum jump value.
irreducibility boolean false Read and Write Flag to enable irreducibility constraint.
topK int (Between 0 and 100000) 100 Read and Write The maximum number of top results to retrieve.
allowVectorOverlap boolean false Read and Write Flag to allow vector overlap in the correlation pattern.
nVectors int (Between 1 and Integer.MAX_VALUE) all Read and Write Number of vectors to read from the dataset.
nDimensions int (Between 1 and Integer.MAX_VALUE) all Read and Write Number of dimensions to read per vector.
partition int (Between 0 and Integer.MAX_VALUE) 0 Read and Write Dataset partition identifier.
dimensionalityReduction boolean false Read and Write Flag to enable dimensionality reduction.
dimredEpsilon double (Between 0 and 1) 0.1 Read and Write Epsilon value for dimensionality reduction.
dimredDelta double (Between 0 and 1) 0.8 Read and Write Delta value for dimensionality reduction.
dimredCorrect boolean true Read and Write Flag to enable dimensionality reduction correction.
dimredComponents Integer (Between 1 and Integer.MAX_VALUE) 0.1 * nDimensions Read and Write Number of dimensionality reduction components.
discounting boolean false Read and Write Flag to enable bound discounting.
discountThreshold double (Between 0 and 2) 0.7 Read and Write Discount threshold value.
discountTopK int (Between 1 and Integer.MAX_VALUE) 10 Read and Write Number of extrema distances to store for each CC.
discountStep int (Between 1 and Integer.MAX_VALUE) 1 Read and Write Discounting step value.
empiricalBounding boolean true Read and Write Flag to enable empirical bounding (only if simMetric supports).
kMeans Integer (Between 1 and Integer.MAX_VALUE) inferred from simMetric Read and Write K-means parameter for Hierarchical Clustering algorithm .
geoCentroid boolean false Read and Write Flag to enable usage of geometric centroid in clusters.
startEpsilon double (Between 0 and Double.MAX_VALUE) inferred from simMetric Read and Write Starting epsilon value for clustering.
epsilonMultiplier double (Between 0 and 1) 0.8 Read and Write Epsilon multiplier for clustering.
maxLevels int (Between 1 and Integer.MAX_VALUE) 20 Read and Write Maximum levels in the cluster hierarchy.
clusteringAlgorithm KMEANS KMEANS Read and Write Clustering algorithm to use.
breakFirstKLevelsToMoreClusters int (Between 0 and Integer.MAX_VALUE) 0 Read and Write Number of levels to break into more clusters.
clusteringRetries int (Between 1 and Integer.MAX_VALUE) 20 Read and Write Number of clustering tries per cluster level.
hashSize int (Between 1 and Integer.MAX_VALUE) inferred from query Read and Write Hash size for caches (centroids and cluster combinations).
BFSRatio double (Between 0 and 1) 0.5 Read and Write BFS ratio for traversal of the comparison tree.
BFSFactor double inferred from BFSRatio Read and Write BFS factor for traversal of the comparison tree (based on ratio).
shrinkFactor double 0 Read and Write Shrink factor $\gamma$ for top-k queries.
statBag StatBag (Object) constructed after init() Read-only Statistics bag.
randomGenerator Random (Object) constructed after init() Read-only Random number generator.
pairwiseDistances double[][] (2D Array) constructed after init() Read-only Pairwise distances cache.

How to Use

To use these parameters and run a query with the Correlation Detective algorithm, refer to the README.md file for usage instructions and examples.