The clustering behavior of sliding windows

somecontext

Of further potential interest: This paper cites an earlier paper by Keogh and Lin with the provocative title "Clustering of time-series subsequences is meaningless", available online at https://www.cs.ucr.edu/~eamonn/meaningless.pdf

somecontext

Here's the abstract of this earlier paper:

> Given the recent explosion of interest in streaming data and online algorithms, clustering of time series subsequences, extracted via a sliding window, has received much attention. In this work we make a surprising claim. Clustering of time series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster subsequences on some time series datasets.

Several commenters here seem to ask "okay, so then what's the right way to cluster windows of timeseries??" Perhaps the final sentence of this abstract suggests a solution in that direction?

yorwba

The suggested solution is look at motifs: windows that are highly similar when trivial matches due to window overlap are excluded. If you take this to its logical conclusion, you end up in the Matrix Profile rabbit hole. https://www.cs.ucr.edu/~eamonn/MatrixProfile.html

achierius

What makes this a rabbit hole? I've looked into it before but not deeply, is it tricky to use in practice for a non obvious reason?

isoprophlex

Absolutely mindblowing. A strong claim, but they present very strong arguments. Thanks for sharing.

jmole

You'd think this would be more well known. In fourier analysis or other frequency/wavelet decomposition techniques, the data you get out depends a ton on the windowing function that you use.

For example:

https://en.wikipedia.org/wiki/Sinc_function

https://en.wikipedia.org/wiki/Window_function#Examples_of_wi...

eamonnkeogh

When we wrote "Clustering of time-series subsequences is meaningless", it took six attempts to get it published.

One reviewer wrote (word for word) "you will get in bad trouble if you publish this"

mharig

[dead]

robotresearcher

The problematic projection into vectors is done using a window size of a constant number of samples? Isn't that very weird?

Unless the samples are nicely periodic, that makes the contents of a vector, and the position of a value in a vector, dependent only on the window length and the order of time values, and not the time values themselves. Since the ordering in the vector, and thus the meaning of that dimension in vector space, is likely to be effectively random, why would we expect clustering in that vector space to mean anything? It's a weird thing to do on the face of it.

yorwba

Imagine you know that there are some repeating patterns in your dataset, e.g. heartbeats in an EKG. If you take two windows that happen to contain a heartbeat at the same position, the corresponding vectors will be close, and if the positions are different, the vectors should become more dissimilar the more the alignment is off.

Then if you create a number of clusters equal to the window size, you might expect that each cluster will correspond to one of the possible positions of the heartbeat within the window. But somehow that's not what happens...

somecontext

My understanding is that people would mostly try this approach when the samples are periodic/regular.

kscarlet

> Unless the samples are nicely periodic

This is assumed from context.

amy214

The paper ignored a big honking chunk of the field, as is used in finance often - you take the delta between adjacent timepoints, something like the derivative, now you have something centered on zero and going up and down "randomly". Something like PCA also typically may use centering although that's not exactly the same. I would have loved to see an investigation of window size and such and deltad time series (usually either x2-x1, or x2/x1). In this sense the paper is just recapitulating, "if you don't delta your time series bad things happen"

newer_vienna

Interesting proofs, though I would love more of the examples that show the dataset and the unexpected results, as well as describing what may lead to these traps and how to avoid them

PaulHoule

... if I get it right the whole idea of clustering sliding windows is wrong, the question of "what you should do instead?" is an interesting one.

I'd imagine two answers are: (1) for time series which are somewhat periodic you might cut out individual days or weeks and try to cluster them for each other, (2) for time series which are intermittent you might create some definition of an "event" (an earthquake, or a particle passing through a detector) and then cluster events and maybe (3) for something episodic such as "heart rate during a workout" you would cluster episodes.

bsteinbach

In defense of sliding windows of time series, if not clustering, the principal components analysis of the matrix on page 1 is one robust way to extract a linear model for the system producing the time series. It's useful for finding frequencies of closely spaced oscillators in a time series when least squares fitting would be arduous to parameterize and convergence a battle.

hermes0

very interesting, but I wonder if it would hold for clustering with other distance metrics such as dynamic time warping (DTW).

whatshisface

I don't see how the authors are calling a sinusoidal centroid meaningless even while observing that it's the dominant component. That's the meaning of the centroid. A 1-means cluster is just the average and the unique features are averaged out.

exabrial

Slightly disappointed this was about not about how to fix my porch door.

pottertheotter

Well, I thought this was going to be about home design and sliding windows.

HN

The clustering behavior of sliding windows

The clustering behavior of sliding windows