From 2189a6dbd7419ae2aec82d5a262e5532d22d3bd4 Mon Sep 17 00:00:00 2001 From: asardaes Date: Sun, 7 Jul 2024 20:22:01 +0200 Subject: [PATCH] Update documentation --- CHANGELOG.md | 2 +- R/DISTANCES-sdtw.R | 6 ++++++ R/RD-helpers.R | 3 +++ inst/NEWS.Rd | 2 +- man/GAK.Rd | 3 +++ man/SBD.Rd | 3 +++ man/dtw_basic.Rd | 3 +++ man/sdtw.Rd | 9 +++++++++ 8 files changed, 29 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e18bea53..fcc12873 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ ## Version 6.0.0 * Update Makevars for ARM version of Windows. * Sanitize internal usage of `do.call` to avoid huge backtraces. -* Support lower triangular `distmat` objects for symmetric distances (#77). +* Support lower triangular `distmat` objects for symmetric distances (#77) - breaking change. ## Version 5.5.12 * Remove explicit C++ requirements. diff --git a/R/DISTANCES-sdtw.R b/R/DISTANCES-sdtw.R index 7c3c9a72..f990d0bd 100644 --- a/R/DISTANCES-sdtw.R +++ b/R/DISTANCES-sdtw.R @@ -21,6 +21,12 @@ #' #' `r roxygen_proxy_symmetric()` #' +#' Note that, due to the fact that this distance is not always zero when a series is compared +#' against itself, this optimization is likely problematic for soft-DTW, as the `dist` object will +#' be handled by many functions as if it had only zeroes in the diagonal. An exception is +#' [tsclust()] when using partitional clustering with PAM centroids---actual diagonal values will +#' be calculated and considered internally in that case. +#' #' @references #' #' Cuturi, M., & Blondel, M. (2017). Soft-DTW: a Differentiable Loss Function for Time-Series. arXiv diff --git a/R/RD-helpers.R b/R/RD-helpers.R index 31262a4a..d074652f 100644 --- a/R/RD-helpers.R +++ b/R/RD-helpers.R @@ -26,6 +26,9 @@ See the parallelization vignette for more information - `browseVignettes(\"dtwcl roxygen_proxy_symmetric <- function() { "It also includes symmetric optimizations to calculate only half a distance matrix when appropriate---only one list of series should be provided in `x`. +Starting with version 6.0.0, this optimization means that the function returns an array with the lower triangular values of the distance matrix, +similar to what [stats::dist()] does; +see [DistmatLowerTriangular-class] for a helper to access elements as it if were a normal matrix. If you want to avoid this optimization, call [proxy::dist] by giving the same list of series in both `x` and `y`." } diff --git a/inst/NEWS.Rd b/inst/NEWS.Rd index 3156ff5e..5293f7a1 100644 --- a/inst/NEWS.Rd +++ b/inst/NEWS.Rd @@ -11,6 +11,6 @@ \itemize{ \item Update Makevars for ARM version of Windows. \item Sanitize internal usage of \code{do.call} to avoid huge backtraces. - \item Support lower triangular \code{distmat} objects for symmetric distances. See PR #77 on GitHub. + \item Support lower triangular \code{distmat} objects for symmetric distances. This is a slightly breaking change, see PR #77 on GitHub. } } diff --git a/man/GAK.Rd b/man/GAK.Rd index 0dc1cb5f..73e75843 100644 --- a/man/GAK.Rd +++ b/man/GAK.Rd @@ -89,6 +89,9 @@ this function will default to 1 thread per worker. See the parallelization vignette for more information - \code{browseVignettes("dtwclust")} It also includes symmetric optimizations to calculate only half a distance matrix when appropriate---only one list of series should be provided in \code{x}. +Starting with version 6.0.0, this optimization means that the function returns an array with the lower triangular values of the distance matrix, +similar to what \code{\link[stats:dist]{stats::dist()}} does; +see \linkS4class{DistmatLowerTriangular} for a helper to access elements as it if were a normal matrix. If you want to avoid this optimization, call \link[proxy:dist]{proxy::dist} by giving the same list of series in both \code{x} and \code{y}. } diff --git a/man/SBD.Rd b/man/SBD.Rd index 7d070e23..c9af3bcc 100644 --- a/man/SBD.Rd +++ b/man/SBD.Rd @@ -63,6 +63,9 @@ this function will default to 1 thread per worker. See the parallelization vignette for more information - \code{browseVignettes("dtwclust")} It also includes symmetric optimizations to calculate only half a distance matrix when appropriate---only one list of series should be provided in \code{x}. +Starting with version 6.0.0, this optimization means that the function returns an array with the lower triangular values of the distance matrix, +similar to what \code{\link[stats:dist]{stats::dist()}} does; +see \linkS4class{DistmatLowerTriangular} for a helper to access elements as it if were a normal matrix. If you want to avoid this optimization, call \link[proxy:dist]{proxy::dist} by giving the same list of series in both \code{x} and \code{y}. In some situations, e.g. for relatively small distance matrices, the overhead introduced by the diff --git a/man/dtw_basic.Rd b/man/dtw_basic.Rd index 0931f181..153a3329 100644 --- a/man/dtw_basic.Rd +++ b/man/dtw_basic.Rd @@ -88,6 +88,9 @@ this function will default to 1 thread per worker. See the parallelization vignette for more information - \code{browseVignettes("dtwclust")} It also includes symmetric optimizations to calculate only half a distance matrix when appropriate---only one list of series should be provided in \code{x}. +Starting with version 6.0.0, this optimization means that the function returns an array with the lower triangular values of the distance matrix, +similar to what \code{\link[stats:dist]{stats::dist()}} does; +see \linkS4class{DistmatLowerTriangular} for a helper to access elements as it if were a normal matrix. If you want to avoid this optimization, call \link[proxy:dist]{proxy::dist} by giving the same list of series in both \code{x} and \code{y}. In order for symmetry to apply here, the following must be true: no window constraint is used diff --git a/man/sdtw.Rd b/man/sdtw.Rd index c829cf77..217356e1 100644 --- a/man/sdtw.Rd +++ b/man/sdtw.Rd @@ -41,7 +41,16 @@ this function will default to 1 thread per worker. See the parallelization vignette for more information - \code{browseVignettes("dtwclust")} It also includes symmetric optimizations to calculate only half a distance matrix when appropriate---only one list of series should be provided in \code{x}. +Starting with version 6.0.0, this optimization means that the function returns an array with the lower triangular values of the distance matrix, +similar to what \code{\link[stats:dist]{stats::dist()}} does; +see \linkS4class{DistmatLowerTriangular} for a helper to access elements as it if were a normal matrix. If you want to avoid this optimization, call \link[proxy:dist]{proxy::dist} by giving the same list of series in both \code{x} and \code{y}. + +Note that, due to the fact that this distance is not always zero when a series is compared +against itself, this optimization is likely problematic for soft-DTW, as the \code{dist} object will +be handled by many functions as if it had only zeroes in the diagonal. An exception is +\code{\link[=tsclust]{tsclust()}} when using partitional clustering with PAM centroids---actual diagonal values will +be calculated and considered internally in that case. } \references{