Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support distmat as lower triangular for PAM #77

Merged
merged 14 commits into from
Jul 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/check-sanitizers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ jobs:
run: |
install.packages("remotes")
library("remotes")
install_cran(c("Matrix", "bigmemory"))
install_cran(c("codetools", "cluster", "lattice", "class"), force = TRUE)
install_cran(c("bigmemory"))
install_cran(c("Matrix", "codetools", "cluster", "lattice", "class"), force = TRUE)
update(dev_package_deps(dependencies = TRUE))

- name: Check with sanitizers
Expand All @@ -62,7 +62,7 @@ jobs:
ASAN_OPTIONS: "detect_leaks=0:detect_odr_violation=0"
run: |
Rdevel CMD build --no-build-vignettes --no-manual .
Rdevel --vanilla CMD check *.tar.gz --as-cran --no-manual
Rdevel --vanilla CMD check *.tar.gz --as-cran --no-manual --ignore-vignettes
continue-on-error: true

- name: Show testthat output
Expand Down
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Changelog

## Version 5.6.0
## Version 6.0.0
* Update Makevars for ARM version of Windows.
* Sanitize internal usage of `do.call` to avoid huge backtraces.
* Support lower triangular `distmat` objects for symmetric distances (#77) - breaking change.

## Version 5.5.12
* Remove explicit C++ requirements.
Expand Down
6 changes: 4 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Description: Time series clustering along with optimized techniques related
All included distance functions have custom loops optimized for the
calculation of cross-distance matrices, including parallelization support.
Several cluster validity indices are included.
Version: 5.5.12
Version: 5.5.12.9000
Depends:
R (>= 3.3.0),
methods,
Expand Down Expand Up @@ -47,7 +47,7 @@ Suggests:
knitr,
rmarkdown,
testthat
Date: 2024-06-22
Date: 2024-07-01
Author: Alexis Sarda-Espinosa
Maintainer: Alexis Sarda <[email protected]>
BugReports: https://github.com/asardaes/dtwclust/issues
Expand Down Expand Up @@ -83,7 +83,9 @@ Collate:
'DISTANCES-sbd.R'
'DISTANCES-sdtw.R'
'GENERICS-cvi.R'
'RD-helpers.R'
'S4-Distmat.R'
'S4-DistmatLowerTriangular.R'
'S4-PairTracker.R'
'S4-SparseDistmat.R'
'S4-tsclustFamily.R'
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ S3method(as.data.frame,pairdist)
S3method(as.matrix,crossdist)
S3method(as.matrix,pairdist)
S3method(base::dim,Distmat)
S3method(base::dim,DistmatLowerTriangular)
S3method(base::dim,SparseDistmat)
S3method(cl_class_ids,TSClusters)
S3method(cl_membership,TSClusters)
Expand Down Expand Up @@ -105,6 +106,7 @@ importFrom(ggplot2,theme_bw)
importFrom(ggrepel,geom_label_repel)
importFrom(graphics,plot)
importFrom(methods,S3Part)
importFrom(methods,as)
importFrom(methods,callNextMethod)
importFrom(methods,initialize)
importFrom(methods,is)
Expand Down
8 changes: 3 additions & 5 deletions R/CENTROIDS-dba.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
#' @param max.iter Maximum number of iterations allowed.
#' @param delta At iteration `i`, if `all(abs(centroid_{i}` `-` `centroid_{i-1})` `< delta)`,
#' convergence is assumed.
#' @template error-check
#' @param error.check `r roxygen_error_check_param()`
#' @param trace If `TRUE`, the current iteration is printed to output.
#' @param mv.ver Multivariate version to use. See below.
#'
Expand All @@ -34,15 +34,13 @@
#' the same result provided the elements of `X` keep the same values, although their order may
#' change.
#'
#' @template rcpp-parallel
#' `r roxygen_window_details()`
#'
#' @section Parallel Computing:
#' @section `r roxygen_rcpp_parallel_section()`
#'
#' This function appears to be very sensitive to numerical inaccuracies if multi-threading is used
#' in a **32 bit** installation. In such systems, consider limiting calculations to 1 thread.
#'
#' @template window
#'
#' @return The average time series.
#'
#' @section Multivariate series:
Expand Down
26 changes: 12 additions & 14 deletions R/CENTROIDS-pam.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#' Extract the medoid time series based on a distance measure.
#'
#' @export
#' @importFrom methods as
#' @importFrom rlang exprs
#' @importFrom Matrix rowSums
#'
Expand All @@ -12,7 +13,7 @@
#' @param ids Integer vector indicating which of the `series` should be considered.
#' @param distmat Optionally, a pre-computed cross-distance matrix of *all* `series`.
#' @param ... Any extra parameters for the `distance` function that may be used.
#' @template error-check
#' @param error.check `r roxygen_error_check_param()`
#'
#' @details
#'
Expand Down Expand Up @@ -43,24 +44,21 @@ pam_cent <- function(series, distance, ids = seq_along(series), distmat = NULL,
if (missing(distance))
distance <- attr(distmat, "method")

args <- rlang::exprs(
distmat = distmat,
series = series,
dist_args = dots,
distance = distance,
control = partitional_control(),
error.check = error.check
)

if (is.null(distmat)) {
if (is.null(distance))
stop("If 'distmat' is missing, 'distance' must be provided.")

args$distmat <- NULL
distmat <- Distmat$new(
series = series,
dist_args = dots,
distance = distance,
control = partitional_control(),
error.check = error.check
)
}
else {
distmat <- methods::as(distmat, "Distmat")
}

# S4-Distmat.R
distmat <- do.call(Distmat$new, args)
}

d <- distmat[ids, ids, drop = FALSE]
Expand Down
6 changes: 2 additions & 4 deletions R/CENTROIDS-sdtw-cent.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ sdtw_cent_stats_gr <- function(ignored, .shared_env_, ...) {
#' for `optim`, and `...` for both).
#' @param opts List of options to pass to `nloptr` or [stats::optim()]'s `control`. The defaults in
#' the function's formals are for `nloptr`, but the value will be adjusted for `optim` if needed.
#' @template error-check
#' @param error.check `r roxygen_error_check_param()`
#'
#' @details
#'
Expand All @@ -52,9 +52,7 @@ sdtw_cent_stats_gr <- function(ignored, .shared_env_, ...) {
#' @return The resulting centroid, with the optimization results as attributes (except for the
#' returned centroid).
#'
#' @template rcpp-parallel
#'
#' @section Parallel Computing:
#' @section `r roxygen_rcpp_parallel_section()`
#'
#' For unknown reasons, this function has returned different results (in the order of 1e-6) when
#' using multi-threading in x64 Windows installations in comparison to other environments (using
Expand Down
2 changes: 1 addition & 1 deletion R/CENTROIDS-shape-extraction.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
#' the matrices in `X`. **It will be z-normalized**.
#' @param znorm Logical flag. Should z-scores be calculated for `X` before processing?
#' @param ... Further arguments for [zscore()].
#' @template error-check
#' @param error.check `r roxygen_error_check_param()`
#'
#' @details
#'
Expand Down
20 changes: 14 additions & 6 deletions R/CLUSTERING-ddist2.R
Original file line number Diff line number Diff line change
Expand Up @@ -179,16 +179,19 @@ ddist2 <- function(distance, control) {

if (!dist_entry$loop) {
# CUSTOM LOOP, LET THEM HANDLE OPTIMIZATIONS
dm <- base::as.matrix(quoted_call(
dm <- quoted_call(
proxy::dist, x = x, y = centroids, method = distance, dots = dots
))
)

if (isTRUE(dots$pairwise)) {
dim(dm) <- NULL
return(ret(dm, class = "pairdist"))
}
else if (inherits(dm, "dist")) {
return(ret(dm))
}
else {
return(ret(dm, class = "crossdist"))
return(ret(base::as.matrix(dm), class = "crossdist"))
}
}

Expand Down Expand Up @@ -237,11 +240,16 @@ ddist2 <- function(distance, control) {
}
else if (!multiple_workers) {
# WHOLE SYMMETRIC DISTMAT WITHOUT CUSTOM LOOP OR USING SEQUENTIAL proxy LOOP
dm <- base::as.matrix(quoted_call(
dm <- quoted_call(
proxy::dist, x = x, y = NULL, method = distance, dots = dots
))
)

return(ret(dm, class = "crossdist"))
if (inherits(dm, "dist")) {
return(ret(dm))
}
else {
return(ret(base::as.matrix(dm), class = "crossdist"))
}
}
}

Expand Down
6 changes: 3 additions & 3 deletions R/CLUSTERING-tadpole.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#' @param window.size Window size constraint for DTW (Sakoe-Chiba). See details.
#' @param k The number of desired clusters. Can be a vector with several values.
#' @param dc The cutoff distance(s). Can be a vector with several values.
#' @template error-check
#' @param error.check `r roxygen_error_check_param()`
#' @param lb Which lower bound to use, "lbk" for [lb_keogh()] or "lbi" for [lb_improved()].
#' @param trace Logical flag. If `TRUE`, more output regarding the progress is printed to screen.
#'
Expand Down Expand Up @@ -39,7 +39,7 @@
#' - For multiple `dc` values, multi-processing with [foreach::foreach()].
#' - The internal distance calculations use multi-threading with [RcppParallel::RcppParallel].
#'
#' @template window
#' `r roxygen_window_details()`
#'
#' @return
#'
Expand All @@ -52,7 +52,7 @@
#' For multiple `k`/`dc` values, a list of lists is returned, each internal list having the
#' aforementioned elements.
#'
#' @template rcpp-parallel
#' @section `r roxygen_rcpp_parallel_section()`
#'
#' @references
#'
Expand Down
Loading
Loading