-
Notifications
You must be signed in to change notification settings - Fork 29
/
README.Rmd
139 lines (112 loc) · 5.39 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
output:
md_document:
variant: markdown_github
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, cache = FALSE, include = FALSE}
library(knitr)
library(dtwclust)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-",
cache = TRUE
)
```
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/dtwclust)](https://cran.r-project.org/package=dtwclust)
[![Travis-CI Build Status](https://travis-ci.org/asardaes/dtwclust.svg?branch=master)](https://travis-ci.org/asardaes/dtwclust)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/asardaes/dtwclust?branch=master&svg=true)](https://ci.appveyor.com/project/asardaes/dtwclust)
[![codecov](https://codecov.io/gh/asardaes/dtwclust/branch/master/graph/badge.svg)](https://codecov.io/gh/asardaes/dtwclust)
[![Downloads](http://cranlogs.r-pkg.org/badges/dtwclust)](http://cranlogs.r-pkg.org/badges/dtwclust)
# Time Series Clustering Along with Optimizations for the Dynamic Time Warping (DTW) Distance
Time series clustering with a wide variety of strategies and a series of optimizations specific to the Dynamic Time Warping (DTW) distance and its corresponding lower bounds (LBs).
There are implementations of both traditional clustering algorithms,
and more recent procedures such as k-Shape and TADPole clustering.
Functionality can be easily extended with custom distance measures and centroid definitions.
Many of the algorithms implemented in this package are specifically tailored to DTW, hence its name.
However, the main clustering function is flexible so that one can test many different clustering approaches,
using either the time series directly,
or by applying suitable transformations and then clustering in the resulting space.
Other implementations included in the package provide some alternatives to DTW.
For more information:
* [Vignette with theory](https://cran.r-project.org/web/packages/dtwclust/vignettes/dtwclust.pdf) (with examples in the appendices)
* [Timing experiments](https://cran.r-project.org/web/packages/dtwclust/vignettes/timing-experiments.html)
* [Parallelization considerations](https://cran.r-project.org/web/packages/dtwclust/vignettes/parallelization-considerations.html)
* [Functions' documentation](https://cran.r-project.org/web/packages/dtwclust/dtwclust.pdf)
* [Sample shiny app](https://asardaes.shinyapps.io/dtwclust-tsclust-interactive/)
* [CRAN's time series view](https://cran.r-project.org/web/views/TimeSeries.html)
## Implementations
* Partitional, hierarchical and fuzzy clustering
+ k-Shape clustering
- Shape-based distance
- Shape extraction for time series
+ TADPole clustering
* An optimized version of DTW
* Keogh's and Lemire's DTW lower bounds
* Global alignment kernel (GAK) distance
* DTW Barycenter Averaging
* Soft-DTW (distance and centroid)
* Some multivariate support (GAK, DTW and soft-DTW)
* Cluster validity indices (crisp and fuzzy, internal and external)
* Parallelization for most functions
## Installation
The latest version from CRAN can be installed with `install.packages("dtwclust")`.
If you want to test the latest version from github,
first install the
[prerequisites for R package development](https://support.rstudio.com/hc/en-us/articles/200486498-Package-Development-Prerequisites)
(LaTeX is only neccesary if you want to build the vignette)
as well as the
[remotes package](https://cran.r-project.org/package=remotes),
and then type `remotes::install_github("asardaes/dtwclust")`.
If you're wondering about which version to install,
take a look at the [CHANGELOG](CHANGELOG.md) file,
I try to keep it updated.
Check the
[Unix](https://travis-ci.org/asardaes/dtwclust)
and
[Windows](https://ci.appveyor.com/project/asardaes/dtwclust)
continuous integration builds to make sure everything is working,
but do note that they tend to fail for reasons unrelated to the package's functionality.
## License
GNU General Public License v3.0. See [license](LICENSE) and [copyrights](inst/COPYRIGHTS).
This software package was developed independently of any organization or institution that is or has been associated with the author.
## Examples
```{r data}
# Load series
data("uciCT")
```
### Partitional
```{r partitional}
pc <- tsclust(CharTraj, type = "partitional", k = 20L,
distance = "dtw_basic", centroid = "pam",
seed = 3247L, trace = TRUE,
args = tsclust_args(dist = list(window.size = 20L)))
plot(pc)
```
### Hierarchical
```{r hierarchical}
hc <- tsclust(CharTraj, type = "hierarchical", k = 20L,
distance = "sbd", trace = TRUE,
control = hierarchical_control(method = "average"))
plot(hc)
```
### Fuzzy
```{r fuzzy}
# Calculate autocorrelation up to 50th lag, considering a list of time series as input
acf_fun <- function(series, ...) {
lapply(series, function(x) { as.numeric(acf(x, lag.max = 50L, plot = FALSE)$acf) })
}
# Autocorrelation-based fuzzy c-means
fc <- tsclust(CharTraj[1L:25L], type = "fuzzy", k = 5L,
preproc = acf_fun, distance = "L2",
seed = 123L)
fc
```
### *Some* multivariate support
```{r multivariate}
# Multivariate series provided as a list of matrices, using GAK distance
mvc <- tsclust(CharTrajMV[1L:20L], k = 4L, distance = "gak", seed = 390L)
# Note how the variables of each series are appended one after the other in the plot
plot(mvc, labels = list(nudge_x = -10, nudge_y = 1))
```