Skip to content(if available)orjump to list(if available)

Time-Series Anomaly Detection: A Decade Review

bluechair

Didn’t see it mentioned but good to know about: UCR matrix profile.

The Matrix Profile is honestly one of the most underrated tools in the time series analysis space - it's ridiculously efficient. The killer feature is how it just works for finding motifs and anomalies without having to mess around with window sizes and thresholds like you do with traditional techniques. Solid across domains too, from manufacturing sensor data to ECG analysis to earthquake detection.

https://www.cs.ucr.edu/~eamonn/MatrixProfile.html

jmpeax

What do you mean you don't have to mess around with window sizes? Matrix profile is highly dependent on the window size.

eamonnkeogh

The MP is so efficent that you can test ALL window lengths at once! This is called MADRID [a].

[a] Matrix Profile XXX: MADRID: A Hyper-Anytime Algorithm to Find Time Series Anomalies of all Lengths. Yue Lu, Thirumalai Vinjamoor Akhil Srinivas, Takaaki Nakamura, Makoto Imamura, and Eamonn Keogh. ICDM 2023.

eamonnkeogh

Thank you for your kind words ;-)

sriram_malhar

Thanks for sharing; I am most intrigued by the sales pitch. But the website is downright ugly.

This is a better presentation by the same folks. https://matrixprofile.org/

Croftengea

I don't think it's being updated. Latest blog posts are from 2020, and Github repos haven't seen commits for the last 5-6 years. MP went a long way since then.

eskaytwo

I don’t think it’s the same people.

sriram_malhar

Ah, you are right. I got the link from the original URL, so I just assumed. Thanks for the correction.

hoseja

Are you being serious? The first page actually has information on it. You can add margins in the devtools.

Croftengea

MP is one of the best univariate methods, but it's actually mentioned in the article.

bee_rider

What does it do? Anything to do with matrices, like, from math?

eskaytwo

It does convolution of each sub sequence across the series length, and then shows the distance of the closest match. This can detect both outliers (long distance from closest match) as well as repeated patterns (short distance).

quijoteuniv

I use offset function in Prometheus to make an average of past weeks as a recording rule. We have a use in our systems that is very "seasonal" as in weekly cycles so I make an average of some metric (offset 1 week, 2 week, 3 week , 4 week/4) and I compare it to the current value of that metric. That way the alarms can be set day or night, weekday or weekend, and the thresholds are dynamic. It compares against an average of the day of the week, or time of the day. There is someone in Gitlab that posted a more in depth explanation of this way of working. https://about.gitlab.com/blog/2019/07/23/anomaly-detection-u... Things get a bit more complicated with holidays, but you can actually programm them into prometheus https://promcon.io/2019-munich/slides/improved-alerting-with...

gr3ml1n

Whenever I have a chart in Grafana that isn't too dense, I almost always add a line for the 7d offset value. Super useful to tell what's normal and what isn't.

CubsFan1060

Gitlab also has this: https://gitlab.com/gitlab-com/gl-infra/tamland

I'm not really smart in these areas, but it feels like forecasting and anomaly detection are pretty related. I could be wrong though.

diab0lic

You are not wrong! An entire subclass of anomaly detection can basically be reduced to: forecast the next data point and then measure the forecast error when the data point arrives.

fnordpiglet

Well it doesn’t really require a forecast - variance based anomaly detection doesn’t make an assertion of the next point but that its maximum change is within some band. Such models usually can’t be used to make a forecast other than the banding bounds.

null

[deleted]

fraserphysics

If you need to detect anomalies as soon as they occur, that seems right. But if you want to detect them later you can also combine back-casting with forecasting. Like Kalman smoothing.

mikehollinger

This doesn’t capture work that’s happened in the last year or so.

For example some former colleagues timeseries foundation model (Granite TS) which was doing pretty well when we were experimenting with it. [1]

An aha moment for me was realizing that the way you can think of anomaly models working is that they’re effectively forecasting the next N steps, and then noticing when the actual measured values are “different enough” from the expected. This is simple to draw on a whiteboard for one signal but when it’s multi variate, pretty neat that it works.

[1] https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1

0cf8612b2e1e

My similar recognition was when I read about isolation forests for outlier detection[0]. When predictions are different from the average, something is off.

[0] https://scikit-learn.org/stable/modules/generated/sklearn.en...

tessierashpool9

what were you thinking then before your aha moment? :D

mikehollinger

> what were you thinking then before your aha moment? :D

My naive view was that there was some sort of “normalization” or “pattern matching” that was happening. Like - you can look at a trend line that generally has some shape, and notice when something changes or there’s a discontinuity. That’s a very simplistic view - but - I assumed that stuff was trying to do regressions and notice when something was out of a statistical norm like k-means analysis. Which works, sort of, but is difficult to generalize.

tessierashpool9

> Like - you can look at a trend line that generally has some shape, and notice when something changes or there’s a discontinuity.

what you describe here is effectively forecasting a model of what is expected to happen and then you notice a deviation from it.

apwheele

Care to share the contexts in which someone needs a zero-shot model for time series? I have just never come across one in which you don't have some historical data to fit a model and go from there.

delusional

In this case I don't think zero-shot means no context. I think it's more used in relation to fine-tuning the model parameters over your data.

> TTM-1 currently supports 2 modes:

> Zeroshot forecasting: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).

> Finetuned forecasting: Finetune the pre-trained model with a subset of your target data to further improve the forecast

Dowwie

In the nascent world of water tech are IOT devices that monitor water flow. These devices can detect leaks and estimate fixture-level water consumption. Leak detection is all about identifying time series outliers. The distribution-based anomaly detection mentioned in the paper is relevant for leak detection. Interestingly, a residence may require multiple distributions due to pipe temperature variations between warm and cold seasons.

montereynack

Gonna throw in my hat here, time series anomaly detection for industrial machinery is the problem my startup is working on! Specifically we’re making it work offline-by-default (we integrate the AI with the equipment, and don’t send data to any third party servers - even ours) because we feel there’s a ton of customer opportunities that get left in the dust because they can’t be online. If you or someone you know is looking for a monitoring solution for industrial machinery, or are passionate about security-conscious industrial software (we also are developing a data historian) let’s talk! www.sentineldevices.com

zaporozhets

I recently tried to homebrew some anomaly detection work for a performance tracking project and was surprised at the absence of any off-the-shelf OSS or Paid solutions in this space (that weren’t super basic or way too complex). Lots of fertile ground here!

rad_gruchalski

There's a ton of material related to anomaly detection with Prometheus and Grafana stack: https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to.... But maybe this is the "way too complex" case you mention.

CubsFan1060

I'm still playing around with this one: https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to... (there's a github repo for it).

So far, it's not terrible, but has some pretty big flaws.

jcreixell

Hi, co-author of the blog post here. I would love to learn more about the flaws you see and if ideas on how to improve it! We definitely plan to iterate on it and make it as good as we possibly can.

CubsFan1060

To be clear "some big flaws" was probably overstating it. I'm going to edit that. Also, thanks for the work on this. I would absolutely love to contribute, but my maths are not good enough for this :)

The biggest thing I've run into in my testing is that an anomaly of reasonably short timeframe seems to throw the upper and lower bands off for quite some time.

That being said, perhaps changing some of the variables would help with that, and I just don't have enough skill to be able to understand the exact way to adjust that.

pnathan

The number of manual tweaks required to the approach suggest that it is essentially an ad hoc experimental fitting, rather than a stable theoretical model that can adapt to your time series.

nyrikki

Not really related to the above post, but one thing I am not seeing on an initial pass is the advancement of understanding of problems like riddled or wada basins.

Especially with time delays this and 3+ attractors this can be problematic.

A simple example:

https://doi.org/10.21203/rs.3.rs-1088857/v1

There are tools to try and detect these features that were found over the past few decades, and I know I wasted a few years on a project that superficially looked like a FP issue, but ended up being a mix of the wada property and/or porous sets.

The complications will describing these worse than traditional chaos indeterminate situations may make it inappropriate for you.

But it would be nice if visibility was increased. Funny enough most LLMs corpus is mostly fed from a LSAT question.

There has been a lot of movement here when you have n>=3 attractors/exits.

Not solutions unfortunately, but tools to help figure out when you hit it.

hackernewds

anything in grafana is inherently not exportable to any code though which is rather annoying cuz their UI really sucks

davkal

Hi! I work on the Grafana OSS team. We added some more export options recently (dashboards have a big Export button at the top right; panels can export their data via the panel menu / Inspect / Data), try it on our demo page: https://play.grafana.org/d/000000003/graphite3a-sample-websi...

Could you describe your use case around "exportable to any code" a bit more?

ramon156

I needed a TS anomaly detection for my internship because we needed to track when a machine/server was doing poorly or had unplanned downtime. I expected Microsoft's C# library to be able to do this, but my god, it's a mess. If someone has the time and will to implement a proper library then that would ve awesome.

mr_toad

What you’re probably after is called statistical process control. There are Python libraries like pyspc, but the theory is simple enough that you could write your own pretty easily.

neonsunset

Anomaly detection in time-series data is not a concern of the standard library of all things. Nor is it a concern of "base abstractions" shipped as extensions (think ILogger).

Phurist

If only life was as simple as calling .isAnomaly() on anything

jeffbee

The reason there are not off-the-shelf solutions is this is an unsolved problem. There is no approach that is generally useful.

otterley

Perhaps not, but an efficient, multi-language library of different functions would allow for relatively easy implementation and experimentation.

phirschybar

agreed. at my company we ended up rolling our own system. but this area is absolutely ripe for some configurable saas or OS tool with advanced reporting and alerting mechanisms. Datadog has a decent offering, but it's pretty $$$$.

montereynack

Gonna throw in my hat and say that if you’re working on industrial applications (like energy or manufacturing) give us a holler at www.sentineldevices.com! Plug-and-play time series monitoring for industrial applications is exactly what we do.

hackernewds

there's always prophet. forecast the next value and look at the difference

jorl17

I have a soft spot for this area. Almost 10 years ago, my Masters touched on something somewhat adjacent to this (Online Failure Prediction): https://estudogeral.uc.pt/handle/10316/99218

We built a system to detect exceptions before they happened, and act on them, hoping that this would be better than letting them happen (e.g. preemptively slow down the rate of requests instead of leading to database exhaustion)

At the time, I felt that there was soooooooo much to do in the area, and I'm kinda sad I never worked on it again.

djoldman

> Unfortunately, inherent complexities in the data generation of these processes, combined with imperfections in the measurement systems as well as interactions with malicious actors, often result in abnormal phenomena. Such abnormal events appear subsequently in the collected data as anomalies.

This is critical; and difficult to deal with in many instances.

> With the term anomalies we refer to data points or groups of data points that do not conform to some notion of normality or an expected behavior based on previously observed data.

This is a key problem or perhaps the problem: rigorously or precisely defining what an anomaly is and is not.

hazrmard

Anomaly detection (AD) can arguably be a value-add to any industry. It may not be a core product, but AD can help optimize operations for almost anyone.

* Manufacturing: Computer vision to pick anomalies off the assembly line.

* Operation: Accelerometers/temperature sensors w/ frequency analysis to detect onset of faults (prognostics / diagnostics) and do predictive maintenance.

* Sales: Timeseries analyses on numbers / support calls to detect up/downticks in cashflows, customer satisfaction etc.

Imanari

Look up Eamonn Keogh, he has lots of interesting work on TSAD.

ivoflipse

His Google Tech Talk made me really appreciate his groups work, even though I have no need for time series analysis

https://youtu.be/vzPgHF7gcUQ?si=rKQvOjK_qjiSSvKE

itissid

Can someone explain to me how are SVMs are being classified in this paper as "Distribution-Based"? This is quite confusing as a taxonomy. They generaly don't estimate model free densities(kernel density estimates) or model based(separating one or more possibly overlapping normal distributions).

I get that they could be explicitly modeling a data generating process's probabilty itself(just like a NN) like of a Bernoulli(whose ML function is X-Entropy) or a Normal(ML function Mean Square loss), but I don't think that is what the author meant by a Distribution .

My understandin is that they don't make distributional assumption on the random variable(your Y or X) they are trying to find a max margin for.

mlepath

The process-centric taxonomy in this paper is one of the most structured frameworks I’ve seen for anomaly detection methods. It breaks down approaches into distance-based, density-based, and prediction-based categories. In practice (been doing time series analysis professionally for 8+ years), I’ve found that prediction-based methods (e.g., reconstruction errors in autoencoders) are fantastic for semi-supervised use cases but fall short for streaming data.