Time-Series Anomaly Detection: A Decade Review
80 comments
·January 6, 2025bluechair
jmpeax
What do you mean you don't have to mess around with window sizes? Matrix profile is highly dependent on the window size.
eamonnkeogh
The MP is so efficent that you can test ALL window lengths at once! This is called MADRID [a].
[a] Matrix Profile XXX: MADRID: A Hyper-Anytime Algorithm to Find Time Series Anomalies of all Lengths. Yue Lu, Thirumalai Vinjamoor Akhil Srinivas, Takaaki Nakamura, Makoto Imamura, and Eamonn Keogh. ICDM 2023.
eamonnkeogh
Thank you for your kind words ;-)
sriram_malhar
Thanks for sharing; I am most intrigued by the sales pitch. But the website is downright ugly.
This is a better presentation by the same folks. https://matrixprofile.org/
Croftengea
I don't think it's being updated. Latest blog posts are from 2020, and Github repos haven't seen commits for the last 5-6 years. MP went a long way since then.
eskaytwo
I don’t think it’s the same people.
sriram_malhar
Ah, you are right. I got the link from the original URL, so I just assumed. Thanks for the correction.
hoseja
Are you being serious? The first page actually has information on it. You can add margins in the devtools.
Croftengea
MP is one of the best univariate methods, but it's actually mentioned in the article.
bee_rider
What does it do? Anything to do with matrices, like, from math?
eskaytwo
It does convolution of each sub sequence across the series length, and then shows the distance of the closest match. This can detect both outliers (long distance from closest match) as well as repeated patterns (short distance).
belter
"Introduction to Matrix Profiles" - https://towardsdatascience.com/introduction-to-matrix-profil...
quijoteuniv
I use offset function in Prometheus to make an average of past weeks as a recording rule. We have a use in our systems that is very "seasonal" as in weekly cycles so I make an average of some metric (offset 1 week, 2 week, 3 week , 4 week/4) and I compare it to the current value of that metric. That way the alarms can be set day or night, weekday or weekend, and the thresholds are dynamic. It compares against an average of the day of the week, or time of the day. There is someone in Gitlab that posted a more in depth explanation of this way of working. https://about.gitlab.com/blog/2019/07/23/anomaly-detection-u... Things get a bit more complicated with holidays, but you can actually programm them into prometheus https://promcon.io/2019-munich/slides/improved-alerting-with...
gr3ml1n
Whenever I have a chart in Grafana that isn't too dense, I almost always add a line for the 7d offset value. Super useful to tell what's normal and what isn't.
CubsFan1060
Gitlab also has this: https://gitlab.com/gitlab-com/gl-infra/tamland
I'm not really smart in these areas, but it feels like forecasting and anomaly detection are pretty related. I could be wrong though.
diab0lic
You are not wrong! An entire subclass of anomaly detection can basically be reduced to: forecast the next data point and then measure the forecast error when the data point arrives.
fnordpiglet
Well it doesn’t really require a forecast - variance based anomaly detection doesn’t make an assertion of the next point but that its maximum change is within some band. Such models usually can’t be used to make a forecast other than the banding bounds.
null
fraserphysics
If you need to detect anomalies as soon as they occur, that seems right. But if you want to detect them later you can also combine back-casting with forecasting. Like Kalman smoothing.
mikehollinger
This doesn’t capture work that’s happened in the last year or so.
For example some former colleagues timeseries foundation model (Granite TS) which was doing pretty well when we were experimenting with it. [1]
An aha moment for me was realizing that the way you can think of anomaly models working is that they’re effectively forecasting the next N steps, and then noticing when the actual measured values are “different enough” from the expected. This is simple to draw on a whiteboard for one signal but when it’s multi variate, pretty neat that it works.
[1] https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1
0cf8612b2e1e
My similar recognition was when I read about isolation forests for outlier detection[0]. When predictions are different from the average, something is off.
[0] https://scikit-learn.org/stable/modules/generated/sklearn.en...
tessierashpool9
what were you thinking then before your aha moment? :D
mikehollinger
> what were you thinking then before your aha moment? :D
My naive view was that there was some sort of “normalization” or “pattern matching” that was happening. Like - you can look at a trend line that generally has some shape, and notice when something changes or there’s a discontinuity. That’s a very simplistic view - but - I assumed that stuff was trying to do regressions and notice when something was out of a statistical norm like k-means analysis. Which works, sort of, but is difficult to generalize.
tessierashpool9
> Like - you can look at a trend line that generally has some shape, and notice when something changes or there’s a discontinuity.
what you describe here is effectively forecasting a model of what is expected to happen and then you notice a deviation from it.
apwheele
Care to share the contexts in which someone needs a zero-shot model for time series? I have just never come across one in which you don't have some historical data to fit a model and go from there.
delusional
In this case I don't think zero-shot means no context. I think it's more used in relation to fine-tuning the model parameters over your data.
> TTM-1 currently supports 2 modes:
> Zeroshot forecasting: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).
> Finetuned forecasting: Finetune the pre-trained model with a subset of your target data to further improve the forecast
Dowwie
In the nascent world of water tech are IOT devices that monitor water flow. These devices can detect leaks and estimate fixture-level water consumption. Leak detection is all about identifying time series outliers. The distribution-based anomaly detection mentioned in the paper is relevant for leak detection. Interestingly, a residence may require multiple distributions due to pipe temperature variations between warm and cold seasons.
montereynack
Gonna throw in my hat here, time series anomaly detection for industrial machinery is the problem my startup is working on! Specifically we’re making it work offline-by-default (we integrate the AI with the equipment, and don’t send data to any third party servers - even ours) because we feel there’s a ton of customer opportunities that get left in the dust because they can’t be online. If you or someone you know is looking for a monitoring solution for industrial machinery, or are passionate about security-conscious industrial software (we also are developing a data historian) let’s talk! www.sentineldevices.com
zaporozhets
I recently tried to homebrew some anomaly detection work for a performance tracking project and was surprised at the absence of any off-the-shelf OSS or Paid solutions in this space (that weren’t super basic or way too complex). Lots of fertile ground here!
rad_gruchalski
There's a ton of material related to anomaly detection with Prometheus and Grafana stack: https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to.... But maybe this is the "way too complex" case you mention.
CubsFan1060
I'm still playing around with this one: https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to... (there's a github repo for it).
So far, it's not terrible, but has some pretty big flaws.
jcreixell
Hi, co-author of the blog post here. I would love to learn more about the flaws you see and if ideas on how to improve it! We definitely plan to iterate on it and make it as good as we possibly can.
CubsFan1060
To be clear "some big flaws" was probably overstating it. I'm going to edit that. Also, thanks for the work on this. I would absolutely love to contribute, but my maths are not good enough for this :)
The biggest thing I've run into in my testing is that an anomaly of reasonably short timeframe seems to throw the upper and lower bands off for quite some time.
That being said, perhaps changing some of the variables would help with that, and I just don't have enough skill to be able to understand the exact way to adjust that.
pnathan
The number of manual tweaks required to the approach suggest that it is essentially an ad hoc experimental fitting, rather than a stable theoretical model that can adapt to your time series.
nyrikki
Not really related to the above post, but one thing I am not seeing on an initial pass is the advancement of understanding of problems like riddled or wada basins.
Especially with time delays this and 3+ attractors this can be problematic.
A simple example:
https://doi.org/10.21203/rs.3.rs-1088857/v1
There are tools to try and detect these features that were found over the past few decades, and I know I wasted a few years on a project that superficially looked like a FP issue, but ended up being a mix of the wada property and/or porous sets.
The complications will describing these worse than traditional chaos indeterminate situations may make it inappropriate for you.
But it would be nice if visibility was increased. Funny enough most LLMs corpus is mostly fed from a LSAT question.
There has been a lot of movement here when you have n>=3 attractors/exits.
Not solutions unfortunately, but tools to help figure out when you hit it.
hackernewds
anything in grafana is inherently not exportable to any code though which is rather annoying cuz their UI really sucks
davkal
Hi! I work on the Grafana OSS team. We added some more export options recently (dashboards have a big Export button at the top right; panels can export their data via the panel menu / Inspect / Data), try it on our demo page: https://play.grafana.org/d/000000003/graphite3a-sample-websi...
Could you describe your use case around "exportable to any code" a bit more?
ramon156
I needed a TS anomaly detection for my internship because we needed to track when a machine/server was doing poorly or had unplanned downtime. I expected Microsoft's C# library to be able to do this, but my god, it's a mess. If someone has the time and will to implement a proper library then that would ve awesome.
mr_toad
What you’re probably after is called statistical process control. There are Python libraries like pyspc, but the theory is simple enough that you could write your own pretty easily.
neonsunset
Anomaly detection in time-series data is not a concern of the standard library of all things. Nor is it a concern of "base abstractions" shipped as extensions (think ILogger).
Phurist
If only life was as simple as calling .isAnomaly() on anything
jeffbee
The reason there are not off-the-shelf solutions is this is an unsolved problem. There is no approach that is generally useful.
otterley
Perhaps not, but an efficient, multi-language library of different functions would allow for relatively easy implementation and experimentation.
phirschybar
agreed. at my company we ended up rolling our own system. but this area is absolutely ripe for some configurable saas or OS tool with advanced reporting and alerting mechanisms. Datadog has a decent offering, but it's pretty $$$$.
montereynack
Gonna throw in my hat and say that if you’re working on industrial applications (like energy or manufacturing) give us a holler at www.sentineldevices.com! Plug-and-play time series monitoring for industrial applications is exactly what we do.
hackernewds
there's always prophet. forecast the next value and look at the difference
jorl17
I have a soft spot for this area. Almost 10 years ago, my Masters touched on something somewhat adjacent to this (Online Failure Prediction): https://estudogeral.uc.pt/handle/10316/99218
We built a system to detect exceptions before they happened, and act on them, hoping that this would be better than letting them happen (e.g. preemptively slow down the rate of requests instead of leading to database exhaustion)
At the time, I felt that there was soooooooo much to do in the area, and I'm kinda sad I never worked on it again.
djoldman
> Unfortunately, inherent complexities in the data generation of these processes, combined with imperfections in the measurement systems as well as interactions with malicious actors, often result in abnormal phenomena. Such abnormal events appear subsequently in the collected data as anomalies.
This is critical; and difficult to deal with in many instances.
> With the term anomalies we refer to data points or groups of data points that do not conform to some notion of normality or an expected behavior based on previously observed data.
This is a key problem or perhaps the problem: rigorously or precisely defining what an anomaly is and is not.
hazrmard
Anomaly detection (AD) can arguably be a value-add to any industry. It may not be a core product, but AD can help optimize operations for almost anyone.
* Manufacturing: Computer vision to pick anomalies off the assembly line.
* Operation: Accelerometers/temperature sensors w/ frequency analysis to detect onset of faults (prognostics / diagnostics) and do predictive maintenance.
* Sales: Timeseries analyses on numbers / support calls to detect up/downticks in cashflows, customer satisfaction etc.
Imanari
Look up Eamonn Keogh, he has lots of interesting work on TSAD.
ivoflipse
His Google Tech Talk made me really appreciate his groups work, even though I have no need for time series analysis
itissid
Can someone explain to me how are SVMs are being classified in this paper as "Distribution-Based"? This is quite confusing as a taxonomy. They generaly don't estimate model free densities(kernel density estimates) or model based(separating one or more possibly overlapping normal distributions).
I get that they could be explicitly modeling a data generating process's probabilty itself(just like a NN) like of a Bernoulli(whose ML function is X-Entropy) or a Normal(ML function Mean Square loss), but I don't think that is what the author meant by a Distribution .
My understandin is that they don't make distributional assumption on the random variable(your Y or X) they are trying to find a max margin for.
mlepath
The process-centric taxonomy in this paper is one of the most structured frameworks I’ve seen for anomaly detection methods. It breaks down approaches into distance-based, density-based, and prediction-based categories. In practice (been doing time series analysis professionally for 8+ years), I’ve found that prediction-based methods (e.g., reconstruction errors in autoencoders) are fantastic for semi-supervised use cases but fall short for streaming data.
Didn’t see it mentioned but good to know about: UCR matrix profile.
The Matrix Profile is honestly one of the most underrated tools in the time series analysis space - it's ridiculously efficient. The killer feature is how it just works for finding motifs and anomalies without having to mess around with window sizes and thresholds like you do with traditional techniques. Solid across domains too, from manufacturing sensor data to ECG analysis to earthquake detection.
https://www.cs.ucr.edu/~eamonn/MatrixProfile.html