Kalman Filter Tutorial

76 comments

·January 18, 2025

rsp1984

Always telling this whenever the topic of Kalman Filters come up:

If you're learning the Kalman Filter in isolation, you're kind of learning it backwards and missing out on huge "aha" moments that the surrounding theory can unlock.

To truly understand the Kalman Filter, you need to study Least Squares (aka linear regression), then recursive Least Squares, then the Information Filter (which is a different formulation of the KF). Then you'll realize the KF is just recursive Least Squares reformulated in a way to prioritize efficiency in the update step.

This PDF gives a concise overview:

[1] http://ais.informatik.uni-freiburg.de/teaching/ws13/mapping/...

bradly

I appreciate you taking the time to help people understand higher level concepts.

From a different perspective... I have no traditional background in mathematics or physics. I do not understand the first line of the pdf you posted nor do I understand the process for obtaining the context to understand it.

But I have intellectual curiosity. So the best path forward for me understanding is a path that can maintain that curiosity while making progress on understanding. I can reread the The Six (Not So ) Easy Pieces and not understand any of it and still find value in it. I can play with Arnold's cat and, slowly, through no scientific rigor other than the curiosity of the naked ape, I can experience these concepts that have traditionally been behind gates of context I do not possess keys to.

http://gerdbreitenbach.de/arnold_cat/cat.html

mnky9800n

You aren’t supposed to understand things if you don’t know about them. That’s how it works.

null

[deleted]

mtizim

With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.

Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true. All of this knowledge is free, and much more interesting than a formal restatement of "it was not supposed to rain, but I see clouds outside, I guess I'll expect light rain instead of a big thunderstorm".

bradly

> With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.

I will think more about this, but I'm not sure I agree. I have enjoyed reading Feynman talk about twins and one going on a supersonic vacation without understanding the math. Verisimilitude allows a modeling of understanding with a scalar representation of scientific knowledge, so why not?

Of course I would like to understand the math in its purest forms–just the same as I wanted to read 1Q84 in Japanese to be able to fully experience it in its purest form, but my life isn't structured in a way were that is realistic even if the knowledge of the Japanese language is free.

> Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true.

I can't even foil so the journey toward understanding can feel unattainable in the time resources I have. This absolutely may be a limiting belief, but the concept of knowledge being free ignores the time cost for some exploring these outside of academia or professional setting.

f1shy

> With no mathematical rigor there is no mathematical understanding

While I appreciate rigor to really know deep details, is not only not a requirement for understanding, but a hurdle. A terrible insurmountable hurdle.

To first have understanding, I need some kind intuition. Some explanation that makes sense easily. That explanation is btw, what typically the inventor or discoverer had to begin with, before nailing it down with rigor.

7bit

> With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.

You don't need to know what Gravity is to calculate the time it takes for an apple to fall from a tree. You just need to accept that g=9.8m/s2.

You also don't need to understand the chemistry of flour, salt, sugar, sodium, milk and eggs to bake a cake.

jampekka

None of these are needed, or even useful, for understanding the Kalman filter.

keithalewis

[flagged]

bradly

> Just stop whining about it in public.

I'm curious if this is how my reply came across?

jampekka

I think the easiest way depends on your background knowledge. If you understand linearity of the Gaussian distribution and the Bayesian posterior of Gaussians, the Kalman filter is almost trivial.

For (1D) we get the prior from the linear prediction X'1 = X0*a + b, for which mean(X'1) = mean(X0)*a + b and var(X'1) = var(X0)*a^2, where a and b give the assumed dynamics.

The posterior for Gaussians is the precision weighted mean of the prior and the observation: X1 = (1 - K)*X'1 + Y*K, where the weighting K = (1/var(X'1))/(1/var(X'1) + 1/var(Y)), with Y being the Gaussian observation.

Iterating this gives the Kalman filter. Generalizing this to multiple dimensions is straightforward given the linearity of multidimensional Gaussians.

This is how (after I understood it) it makes it really simple to me, but things like linearity of (multidimensional) Gaussians and the posterior of Gaussians as such probably are not.

krtab

I have written down a similar derivation here if anyone is interested: https://ngr.yt/blog/kalman/

RossBencina

What you write is simple. But your scalar model suppresses the common situation of a measurement matrix with output dimension less than state dimension. Exactly how the Kalman gain formula works under this setting I'm less clear on. Beyond that, additional insight is needed when the measurement matrix is non-linear and K = P_xy P_y^{-1} as in the UKF. At least I get stuck there, with little formal statistics work.

jampekka

Good catch, indeed a measurement matrix is needed if the state and measurement are of different dimensions or require a (linear) transformation. For that use Y = H*z where H is the measurement matrix and z is the observation vector.

For UKF the Y is still a multidimensional Gaussian and computing K is the same. The mean and covariance of Y is computed from Z and the nonlinear measurement function using the unscented transform.

jtrueb

You can keep telling this, but this “esoteric” math is often too much for the people actually implementing the filters.

defrost

It's bread and butter math for physics, Engineering (trad. Engineering), Geophysics, Signal processing etc.

Why would anyone have people implementing Kalman filters who found the math behind them "esoteric"?

Back in the day, in my wet behind the ears phase, my first time implementing a Kalman Filter from scratch, the application was to perform magnetic heading normalisation for on mag data from an airborne geophysical survey - 3 axis nanotesla sensor inputs on each wing and tail boom requiring a per survey calibration pattern to normalise the readings over a fixed location regardless of heading.

This was buried as part of a suite requiring calculation of the geomagnetic reference field (a big paramaterised spherical harmonic equation), upward, downward and reduce to pole continuations of magnetic field equations, raw GPS post processing corrections, etc.

where "etc" goes on for a shelf full of books with a dense chunk of applied mathematics

jampekka

FWIW, I think I understand Kalman filters quite well, but the linked PDF is hard for me to follow, and I'd really struggle to understand it if I didn't already know what it's saying.

I think the lesson there is that the Kalman filter is simpler in the "information form" where the Gaussian distribution is parameterized using the inverse of the covariance matrix.

If you don't already know what that means, you likely don't get much out of that. I think the more intuitive way is to first understand the 1D case where the filter result is weighted average of the prediction and the observation where the weights are the multiplicative inverses of the respective variances (the less uncertainty/"inprecision", the more you give weight).

In the multidimensional case the inverse is the matrix inverse but the logic is the same.

More generally the idea is to statistically predict the next step from the previous and then balance out the prediction and the noisy observation based on the confidence you have in each. This intuition covers all Bayesian filters. The Kalman filter is a special case of the Bayesian filter where the prediction is linear and all uncertainties are Gaussian, although it was understood this way only well after Kalman invented the eponymous filter.

Not sure how intuitive that's either, but don't be too worried if these things aren't obvious, because they aren't until you know all the previous steps. To implement or use a Kalman filter you don't really need this statistical understanding.

If you prefer to understand things more "procedually", check out the particle filter. It's conceptually the Bayesian filter but doesn't require the mathematical analysis. That's the way I really understood the underlying logic.

IgorPartola

I understood it as reestimation with a dynamic weight factor based on the perceived error factor. I know it’s more complex than that but this simplified version I needed at one point and it worked.

jbullock35

I found this article invaluable for understanding the Kalman filter from a Bayesian perspective:

Meinhold, Richard J., and Nozer D. Singpurwalla. 1983. "Understanding the Kalman Filter." American Statistician 37 (May): 123–27.

dr_kiszonka

You are probably right, but many folks following your advice will give up halfway through and never get to KF.

RossBencina

This is more or less the approach that is taken by Dan Simon's "Optimal State Estimation" book that I came here to recommend: https://academic.csuohio.edu/simon-daniel/state-estimation/ All the prerequisites are covered prior to introducing the Kalman filter in chapter 5. Although Simon does not go through the information filter before introducing the Kalman filter, he discusses it later.

However, to understand recursive least squares, in particular the covariance matrix update you're going to need a firm grounding in probability and statistics. Simon makes the case that probability theory is a less strict pre-requisite than multiple-input-multiple-output (state space) linear systems theory (for which I can recommend Chen's "Linear System Theory and Design").

So I would argue that to understand Kalman filters you need to know state space systems modelling, both continuous time and discrete time discretisation methods (this provides the dynamics that describe the time-update step), plus you need to know enough multivariate statistics to understand how the Kalman filter propagates the gaussian random variables (i.e. the Kalman state) through the dynamics and back and forth through the measurement matrices.

raincom

That’s the one should learn any subject—-be it physics, chemistry, math, etc. However, textbooks don’t follow that technique.

ryan-duve

I strongly recommend Elements of Physics by Millikan and Gale for anyone who wants to learn pre-quantum physics this way.

jvanderbot

Are you me? I feel like I say this every time too! Perfectly captured.

jtrueb

Every time that one comes up, this one comes up https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt... (and vice versa)

elcritch

That one is one of my favorite resources on Kalman filters. Deriving it from Bayesian principles make it much more intuitive. It also makes adapting or modifying the filters easier to understand.

Using Jupyter notebooks is really great too.

david_draco

As far as I am aware, there is no symbolic computing tool yet for probability distributions? For example, multiplying two multivariate Gaussian PDFs together and getting the covariance matrix out. Or defining all the ingredients for a Kalman filter (prediction model and observing process) and getting the necessary formulas out (as in sympy's lambdify).

jampekka

Sympy can manipulate Gaussian distributions symbolically, but the Gaussian is more or less the only distribution that can be really symbolically manipulated.

Though I'm not sure Sympy can handle the conditional (Bayesian posterior) distribution needed for the Kalman filter.

In any case, you are better off working direcly with the mean and variance (or covariance matrix) if you want to play around with the Kalman filter with Sympy.

mitthrowaway2

The posterior is also a Gaussian, is it not?

jampekka

Yes.

jiggawatts

I'm pretty sure Wolfram Mathematica can do what you're looking for.

See:

https://reference.wolfram.com/language/howto/WorkWithStatist...

and:

https://reference.wolfram.com/language/ref/MultinormalDistri...

nathansherburn

Not sure if relevant but I thought this looked very cool.

https://www.squiggle-language.com/docs

thundercarrot

If Q and R are constant (as is usually the case), the gain quickly converges, such that the Kalman filter is just an exponential filter with a prediction step. For many people this is a lot easier to understand, and even matches how it is typically used, where Q and R are manually tuned until it “looks good” and never changed again. Moreover, there is just one gain to manually tune instead of multiple quantities Q and R.

blharr

This is really what I have never understood about Kalman Filters. As to how you pick Q and R. Do you just adjust them until the result looks right? How does that end up working for anything not completely over-fit?

For example, if I'm tracking birds from video footage, I might choose a certain Q, but depending on the time of day the noise statistics might change. What do you do then?

dang

Related. Others?

Kalman filter from the ground up - https://news.ycombinator.com/item?id=37879715 - Oct 2023 (150 comments)

(also what's the best year to put in the title above?)

graycat

The Kalman Filter is a topic in the more general

David G.\ Luenberger, {\it Optimization by Vector Space Methods,\/} John Wiley and Sons, Inc., New York, 1969.\ \

pmarreck

Something occurred to me a while back: Can we treat events that only have eyewitness testimony with a Kalman filter somehow in order to strengthen the evidential value of the observations after encoding it into vectors of some sort?

This would treat both lying and inaccuracy as "error"

I'm thinking of things like: reports of Phoenix lights or UFOs in general, ghosts, NDEs, and more prosaically, claims of rape

plasticchris

Only if you can make a linear model of those things…

bradly

Why does the model need to be linear?

pinkmuffinere

“Kalman filter” usually refers to “linear quadratic estimator”, which assumes a linear model in its derivation. This will impact the “predict“ step at the very least, and I think also the way the uncertainty propagates. There are nonlinear estimators as well, though they usually have less-nice guarantees (eg particle filter, extended kalman filter)

Edit: in fact, I see part three of the book in tfa is devoted to nonlinear Kalman filters. I suspect some of the crowd (myself included) just assumed we were talking about linear Kalman filters

zipy124

The best resource is almost always this: https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt...

Even if you aren't a python person, it's fantastic and really goes through everything.

dpflan

Anyone else watch Michael van Biezem (with the bow tie) lectures on Kalman Filters while learning this topic?

- https://www.youtube.com/watch?v=CaCcOwJPytQ&list=PLX2gX-ftPV...

FilosofumRex

I like how he teaches so many different topics and subjects using nothing more than arithmetics. All computable math is reducible to arithmetics...

einpoklum

The one sentence you really need to know:

"The filter is named after Rudolf E. Kálmán (May 19, 1930 – July 2, 2016). In 1960, Kálmán published his famous paper describing a recursive solution to the discrete-data linear filtering problem."

blharr

The first example of tracking, is this the same thing as dead reckoning? I've always been confused on the term "tracking" since it is used a lot in common speech, but seems to mean some specific type of 'tracking'

hansvm

Kind of.

"Tracking", here, means providing some kind of `f(time) -> space` API.

Dead reckoning is a mechanism for incorporating velocity and whatnot into a previously estimated position to estimate a new position (and is also one possible way to implement tracking, usually with compounding errors).

The Kalman filter example is better than just dead reckoning. For a simple example, imagine you're standing still but don't know exactly where. You have an API (like GPS) that can estimate your current position within some tolerance. If you're able to query that API repeatedly and the errors aren't correlated, you can pinpoint your location much more precisely.

Back to tracking with non-zero velocity, every new position estimate (e.g., from GPS) can be incorporated with all the information you've seen so far, adjusting your estimates of velocity, acceleration, and position and giving you a much more accurate current estimate but also better data for dead-reckoning estimates while you wait for your next external signal.

The technique (Kalman Filter) is pretty general. It's just merging all your noisy sources of information according to some ruleset (real-world physics being a common ruleset). You can tack on all sorts of other interesting information, like nearby wifi signals or whatever, and even very noisy signals can aggregate to give precise results.

Another application I threw it at once was estimating my true weight, glycogen reserves, ..., from a variety of noisy measurements. The sky's the limit. You just need multiple measurements and a rule for how they interact.

jampekka

This is a very educational and intuitive way of putting it, but to nitpick the Kalman filter is a very special case of this (it assumes a linear ruleset and Gaussian uncertainties in the sensor readings).

What you're describing is in general the Bayesian filter (or Bayesian smoothing if you don't have to give the result immediately).

hansvm

Yep, that's right. I thought about adding that detail but decided it might detract from the main points. Hopefully anyone interested also sees your comment.

defrost

Dead reckoning is a form of prediction, based on past evidence that indicates location then, you are reckoning (best guessing) a current position and detrmining a direction to move forward to reach some target.

"Past evidence that indicates" is deliberate phrasing, in the majority of these examples we are looking at acquired data with noise; errors, instrument noise, missing returns, etc.

"Tracking" is multi-stage, there's a desired target to be found (or to be declared absent) in noisy data .. that's pattern search and locking, the trajectory (the track) of that target must be best guessed, and the best guess forward prediction can be used to assist the search for the target in a new position.

This is not all that can be done with a Kalman filter but it's typical of a class of common applications.

f1shy

Another resource: https://news.ycombinator.com/item?id=42755133

HN

Kalman Filter Tutorial

Kalman Filter Tutorial