Groups t-T paths into "path families" based on the Hausdorff or Fréchet distance between paths.
Arguments
- x
either an object of class
"HeFTy"(output ofread_hefty()), an object of"tTdiss"(output ofpath_diss()), or adata.framecontaining thetime,temperature, andsegmentcolumns of the modeled paths.- k
an integer scalar or vector with the desired number of groups. Ignored when
distequal todbscanorhdbscan(see Note).- dist
character. Algorithm to calculate a dissimilarity matrix (distance) for lines; either
Hausdorff(the default), orFrechet.- method
character. Clustering method to be applied. Currently implemented are
"hclust"Hierarchical Clustering using
stats::hclust(), the default)"kmeans"K-Means Clustering using
stats::kmeans())"pam"Partitioning Around Medoids using
cluster::pam()"dbscan"Density-based Spatial Clustering of Applications with Noise (DBSCAN) using
dbscan::dbscan()(see Note)"hdbscan"Hierarchical DBSCAN using
dbscan::hdbscan()(see Note)"specc"Spectral Clustering using
kernlab::specc()"agnes"Agglomerative hierarchical clustering using
cluster::agnes()"diana"Divisive hierarchical clustering using
cluster::diana()"clara"Clustering Large Applications using
cluster::clara()"fanny"Fuzzy Analysis Clustering using
cluster::fanny()
- naming
character. Naming scheme for clusters. One of
"asis"(output of underlying cluster algorithm),"GOF"(ranks of the mean GOF values within clusters), and"size"(ranks of the size of clusters). Outliers detected bydbscanofhdbscanwill always named as0.- warn
logical. Should there be a warning message if at least one cluster contains less than (
threshold* 100)% of the total paths?- threshold
numeric. The significance threshold as a fraction of the total amount of paths (
0.01by default). If one cluster contains less paths per total paths than this value, a warning message is created. Ignored ifwarnisFALSE.- ...
additional arguments passed to cluster method.
Details
If you want to use a different clustering method that is not built
in the current version of thermoclustr, you can use the distance matrix
produced by path_diss() and feed your cluster algorithm.
Note
that dbscan and hdbscan methods require eps and minPts arguments.
Optimal eps values can be visually estimated from the "knee" in a k-nearest
neighbor distance plot using dbscan::kNNdistplot()'.
See also
path_diss() for calculating dissimilarities and path_nbclust()
for determining the optimal number of clusters.
Examples
data(tT_paths)
tT_paths$paths <- subset(tT_paths$paths, Comp_GOF >= 0.4)
# cluster the paths
cluster_paths(tT_paths, k = 3)
#> segment cluster
#> 1 1 1
#> 2 10 2
#> 3 104 2
#> 4 11 3
#> 5 12 2
#> 6 122 3
#> 7 13 3
#> 8 14 2
#> 9 15 1
#> 10 152 1
#> 11 16 2
#> 12 164 3
#> 13 166 3
#> 14 17 2
#> 15 172 1
#> 16 177 1
#> 17 18 2
#> 18 19 3
#> 19 195 3
#> 20 2 2
#> 21 20 2
#> 22 21 1
#> 23 22 2
#> 24 23 1
#> 25 24 2
#> 26 25 3
#> 27 259 2
#> 28 26 3
#> 29 268 3
#> 30 27 3
#> 31 274 3
#> 32 276 2
#> 33 28 1
#> 34 29 1
#> 35 290 3
#> 36 291 3
#> 37 292 3
#> 38 293 2
#> 39 3 2
#> 40 30 2
#> 41 31 2
#> 42 32 2
#> 43 33 3
#> 44 34 3
#> 45 35 1
#> 46 36 1
#> 47 37 3
#> 48 38 1
#> 49 39 2
#> 50 4 3
#> 51 40 3
#> 52 42 3
#> 53 43 3
#> 54 44 1
#> 55 45 1
#> 56 46 3
#> 57 47 3
#> 58 48 1
#> 59 49 3
#> 60 5 2
#> 61 50 3
#> 62 51 3
#> 63 55 3
#> 64 6 2
#> 65 63 2
#> 66 7 2
#> 67 8 2
#> 68 9 1