Understanding Kalman Filters

Posted on 11/12/2025

In this post, I will talk about the probabilistic derivation of the Kalman filter, which is a nice introduction to a paper I intent to write in the near future, A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning by M. Fraccaro et al. 1

In short, the Kalman filtering deals with the task of finding the filtered posterior of a Linear Gaussian State Space Model, that is p(zty1:t,u1:t)p(\mathbf{z}_t\mid \mathbf{y}_{1:t}, \mathbf{u}_{1:t}). There is also the Kalman smoother (or Rauch–Tung–Striebel smoother), which gives the smoothed posterior p(zty1:T,u1:T)p(\mathbf{z}_t\mid \mathbf{y}_{1:T}, \mathbf{u}_{1:T}), note that the difference is that the smoother uses the entire sequence while the filter only uses data available up to the current time.

Many of the derivations in this post have been extracted from Probabilistic Machine Learning: Advanced Topics by Kevin Patrick Murphy 2, which is a reference I cannot recommend enough to anyone interested in the topic.

Linear Gaussian State Space Model

State-space models (SSM) make use of the state variables to describe the system with a set of first-order differential (or differences) equations.

zt=f(zt1,ut,qt)yt=h(zt,ut,y1:t1,rt)\begin{aligned} \mathbf{z}_t &= \mathbf{f}(\mathbf{z}_{t-1}, \mathbf{u}_t, \mathbf{q}_t) \\ \mathbf{y}_t &= \mathbf{h}(\mathbf{z}_t, \mathbf{u}_t, \mathbf{y}_{1:t-1}, \mathbf{r}_t) \end{aligned}

The SSM consist of:

For instance: for a rocket, the hidden state could be the position and velocity, the input would be the amount of thrust delivered by the engines and the observation could consist of the altitude measurements. The process noise could come from unexpected winds, or differences in the expected atmospheric density. The observation noise could be random biases of the sensor, drifts, etc.

The SSM is normally written as a probabilistic model given by the dynamics model or transition model p(ztzt1,ut)p(\mathbf{z}_t \mid \mathbf{z}_{t-1}, \mathbf{u}_t) and observation model or measurement model p(ytzt,ut,y1:t1)p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t, \mathbf{y}_{1:t-1}).

p(ztzt1,ut)=p ⁣(ztf(zt1,ut))p(ytzt,ut,y1:t1)=p ⁣(yth(zt,ut,y1:t1))\begin{aligned} p(\mathbf{z}_t \mid \mathbf{z}_{t-1}, \mathbf{u}_t) &= p\!\left(\mathbf{z}_t \mid \mathbf{f}(\mathbf{z}_{t-1}, \mathbf{u}_t)\right) \\ p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t, \mathbf{y}_{1:t-1}) &= p\!\left(\mathbf{y}_t \mid \mathbf{h}(\mathbf{z}_t, \mathbf{u}_t, \mathbf{y}_{1:t-1})\right) \end{aligned}

The figure below shows the graphical model of the state-space model

Architecture of the memory cell.
Architecture of the memory cell. Source: Murphy, K. P.

If f\mathbf{f} and h\mathbf{h} are linear functions then we have a linear-Gaussian SSM (LGSSM), this is normally written as

zt=Atzt1+Btut+wtyt=Ctzt+Dtut+δt\begin{aligned} \mathbf{z}_t &= \mathbf{A}_t \mathbf{z}_{t-1} + \mathbf{B}_t \mathbf{u}_t + \boldsymbol{w}_t \\ \mathbf{y}_t &= \mathbf{C}_t \mathbf{z}_t + \mathbf{D}_t \mathbf{u}_t + \boldsymbol{\delta}_t \end{aligned}

where At\mathbf{A}_t is the state transition matrix, Bt\mathbf{B}_t the control-input matrix, Ct\mathbf{C}_t the observation matrix, and Dt\mathbf{D}_t the feed-through matrix, which maps the input utu_t directly to the observation and is often 0 as normally inputs affect the observations only through the state.

Also, normally we assume that the process and observation noises are multivariable Gaussian-distributed random variables with mean 0 and some covariance.

ϵtN(0,Qt)δtN(0,Rt)\begin{aligned} \boldsymbol{\epsilon}_t &\sim \mathcal{N}(\mathbf{0}, \mathbf{Q}_t) \\ \boldsymbol{\delta}_t &\sim \mathcal{N}(\mathbf{0}, \mathbf{R}_t) \end{aligned}

There are two strong reasons for modelling this noises as Gaussian.

Since any linear transformation of a multivariate normal distribution is also multivariate normally distributed we can rewrite the transitions and observation model as

p(ztzt1,ut)=N ⁣(ztAtzt1+Btut,  Qt)p(ytzt,ut)=N ⁣(ytCtzt+Dtut,  Rt)\begin{aligned} p(\mathbf{z}_t \mid \mathbf{z}_{t-1}, \mathbf{u}_t) &= \mathcal{N}\!\left(\mathbf{z}_t \mid \mathbf{A}_t \mathbf{z}_{t-1} + \mathbf{B}_t \mathbf{u}_t,\; \mathbf{Q}_t\right) \\[4pt] p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t) &= \mathcal{N}\!\left(\mathbf{y}_t \mid \mathbf{C}_t \mathbf{z}_t + \mathbf{D}_t \mathbf{u}_t,\; \mathbf{R}_t\right) \end{aligned}

Note that we assume the standard Markov observation model in which yty1:t1(zt,ut)\mathbf{y}_t \bot \mathbf{y}_{1:t-1}\mid (\mathbf{z}_t, \mathbf{u}_t), that is the current observation is independent of the previous observations given the current latent state and control.

Likelihood, Posterior and Prior Beliefs

The goal of the Kalman filter is then to compute the posterior distribution of the hidden state given all measurements and controls so far:

p(zty1:t,u1:t)=p(zty1:t,u1:t)p(\mathbf{z}_t|\mathbf{y}_{1:t},\mathbf{u}_{1:t}) = p(\mathbf{z}_t|\mathbf{y}_{1:t},\mathbf{u}_{1:t})

by direct application of the Bayes rule (Posterior = Likelihood × Prior / Evidence)

P(AB)=P(BA)P(A)P(B)=P(A,B)P(B)P(A|B)=\frac{P(B|A)P(A)}{P(B)} = \frac{P(A,B)}{P(B)}

to the posterior distribution we obtain

p(zty1:t,u1:t)=p(zt,y1:tu1:t)p(y1:tu1:t)p(\mathbf{z}_t \mid \mathbf{y}_{1:t}, \mathbf{u}_{1:t}) = \frac{p(\mathbf{z}_t, \mathbf{y}_{1:t}\mid \mathbf{u}_{1:t})}{p(\mathbf{y}_{1:t} \mid \mathbf{u}_{1:t})}

note that u1:tu_{1:t} are known so they are always given and not random variables. We can then use the chain rule in the numerator, and use the property that in the SSM the current observation only depends on the current state and current input

p(zt,y1:tu1:t)=p(ytzt,y1:t1,u1:t)  p(zt,y1:t1u1:t)=p(ytzt,ut)p(zty1:t1,u1:t)p(y1:t1u1:t)p(\mathbf{z}_t, \mathbf{y}_{1:t}\mid \mathbf{u}_{1:t}) = p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) \; p(\mathbf{z}_t, \mathbf{y}_{1:t-1} \mid \mathbf{u}_{1:t}) \\ = p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t) p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) p(\mathbf{y}_{1:t-1} \mid \mathbf{u}_{1:t})

The denominator can be expressed as the marginalized joint distribution, then substitute the expression above and take outside of the integral the terms that do not depend on zt\mathbf{z}_{t}

p(y1:tu1:t)=p(zt,y1:tu1:t)dzt=p(ytzt,ut)p(zty1:t1,u1:t)p(y1:t1u1:t)=p(y1:t1u1:t)p(ytzt,ut)p(zty1:t1,u1:t)p(\mathbf{y}_{1:t} \mid \mathbf{u}_{1:t}) = \int p(\mathbf{z}_t, \mathbf{y}_{1:t} \mid \mathbf{u}_{1:t}) \, d\mathbf{z}_t = \int p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t) p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) p(\mathbf{y}_{1:t-1} \mid \mathbf{u}_{1:t}) \\ = p(\mathbf{y}_{1:t-1} \mid \mathbf{u}_{1:t}) \int p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t) p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t})

we write numerator and denominator as

p(zty1:t,u1:t)=p(ytzt,ut)  p(zty1:t1,u1:t)p(ytzt,ut)  p(zty1:t1,u1:t)dzt=likelihood×priorevidencep(\mathbf{z}_t \mid \mathbf{y}_{1:t}, \mathbf{u}_{1:t}) = \frac{ p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t)\; p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) }{ \displaystyle \int p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t)\; p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) \, d\mathbf{z}_t } = \frac{\text{likelihood} \times \text{prior}}{\text{evidence}}

In the Bayesian paradigm, the concepts of posterior, likelihood and prior are extremely important.

First, the likelihood relates the data with a set of parameters. In the Kalman filter this relates the observations yt\mathbf{y}_t with the current state zt\mathbf{z}_t and control ut\mathbf{u}_t. This likelihood is directly obtained from the observation model defined above.

p(ytzt,ut)=N ⁣(ytCtzt+Dtut,  Rt)(likelihood)p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t) = \mathcal{N}\!\left(\mathbf{y}_t \mid \mathbf{C}_t \mathbf{z}_t + \mathbf{D}_t \mathbf{u}_t,\; \mathbf{R}_t\right) \quad \text{(likelihood)}

It reads as the probability of getting a measurement yt\mathbf{y}_t for a given state zt\mathbf{z}_t and control ut\mathbf{u}_t.

The prior is our belief of the state before seeing a measurement.

p(zty1:t1,u1:t)p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t})

Since both the likelihood and the prior are multivariate Gaussians the result will be a Gaussian as well, since the evidence is a constant that scales the resulting distribution, in practice it is dropped, as we only care about the mean and covariance of the posterior to compute the Kalman filter.

The Kalman Filter

The Kalman Filter is the algorithm to obtain the exact Bayesian filtering for Linear Gaussian State Space Models, it finds p(zty1:t)=N(ztμtt,Σtt)p(\mathbf{z}_t \mid \mathbf{y}_{1:t}) = \mathcal{N}(z_t| \mathbf{\mu}_{t|t}, \mathbf{\Sigma}_{t|t}) where μtt,Σtt\mathbf{\mu}_{t|t}, \mathbf{\Sigma}_{t|t} refer to the posterior mean and covariance given the observations y1:t\mathbf{y}_{1:t} and the control u1:t\mathbf{u}_{1:t}. Note that there is an analytically close form because all the distributions are Gaussian.

The algorithm consist in a predictor, which get the one-step prediction using the transition model,

Predictor step

The predictor step involves the computation of the prior. First of all note that if zt1\mathbf{z}_{t-1} were known, the distribution of zt\mathbf{z}_{t} would simply be

p(zty1:t1,u1:t)=p(ztzt1,ut)p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) = p(\mathbf{z}_t \mid \mathbf{z}_{t-1}, \mathbf{u}_{t})

however, zt1\mathbf{z}_{t-1} is the hidden state so we only know its distribution

zt1N(μt1t1,Σt1t1)\mathbf{z}_{t-1} \sim \mathcal{N}(\mathbf{\mu}_{t-1|t-1},\mathbf{\Sigma}_{t-1|t-1})

using the total law of probability and the fact that zt\mathbf{z}_{t} is independent of y1:t1\mathbf{y}_{1:t-1} given zt1\mathbf{z}_{t-1} we integrate over the uncertainty in zt1\mathbf{z}_{t-1}

p(zty1:t1,u1:t)=p(ztzt1,y1:t1,u1:t)p(zt1,y1:t1,u1:t)dzt1=p(ztzt1,ut)p(zt1,y1:t1,u1:t)dzt1p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) = \int p(\mathbf{z}_t \mid \mathbf{z}_{t-1}, \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t})p(\mathbf{z}_{t-1}, \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{z}_{t-1} \\ = \int p(\mathbf{z}_t \mid \mathbf{z}_{t-1}, \mathbf{u}_{t})p(\mathbf{z}_{t-1}, \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{z}_{t-1}

the second term in the integral is Gaussian and the first term of the integral is the transition model, which is also Gaussian

p(ztzt1,ut)=N ⁣(ztAtzt1+Btut,  Qt)p(\mathbf{z}_t \mid \mathbf{z}_{t-1}, \mathbf{u}_{t}) = \mathcal{N}\!\left(\mathbf{z}_t \mid \mathbf{A}_t \mathbf{z}_{t-1} + \mathbf{B}_t\mathbf{u}_t,\; \mathbf{Q}_t\right)

we can express the product then as the joint Gaussian over the stacked vector

(zt1zt)N(μ,Σ)\begin{pmatrix} z_{t-1} \\ z_t \end{pmatrix} \sim \mathcal{N}(\mu',\Sigma')

Then the mean of the joint is

μ  =  (E[zt1]E[zt])=(μt1t1Atμt1t1+Btut)\boldsymbol{\mu}' \;=\; \begin{pmatrix} \mathbb{E}[\mathbf{z}_{t-1}] \\ \mathbb{E}[\mathbf{z}_{t}] \end{pmatrix} = \begin{pmatrix} \boldsymbol{\mu}_{t-1|t-1} \\ \mathbf{A}_t\,\boldsymbol{\mu}_{t-1|t-1} + \mathbf{B}_t\,\mathbf{u}_t \end{pmatrix}

and then compute the covariances

Cov(zt1,zt1)=Σt1t1Cov(zt1,zt)=Cov ⁣(zt1,Atzt1+wt)=Σt1t1AtTCov(zt,zt1)=AtΣt1t1Cov(zt,zt)=Cov ⁣(Atzt1+wt)=AtΣt1t1AtT+Qt\begin{align} \mathrm{Cov}(\mathbf{z}_{t-1},\mathbf{z}_{t-1}) &= \boldsymbol{\Sigma}_{t-1|t-1} \\[6pt] \mathrm{Cov}(\mathbf{z}_{t-1},\mathbf{z}_{t}) &= \mathrm{Cov}\!\left(\mathbf{z}_{t-1},\,\mathbf{A}_t\mathbf{z}_{t-1} + \mathbf{w}_t\right) = \boldsymbol{\Sigma}_{t-1|t-1}\mathbf{A}_t^T \\[6pt] \mathrm{Cov}(\mathbf{z}_{t},\mathbf{z}_{t-1}) &= \mathbf{A}_t\boldsymbol{\Sigma}_{t-1|t-1} \\[6pt] \mathrm{Cov}(\mathbf{z}_{t},\mathbf{z}_{t}) &= \mathrm{Cov}\!\left(\mathbf{A}_t\mathbf{z}_{t-1} + \mathbf{w}_t\right) = \mathbf{A}_t\boldsymbol{\Sigma}_{t-1|t-1}\mathbf{A}_t^T + \mathbf{Q}_t \end{align}

to assemble the matrix

Σ=(Σt1t1Σt1t1AtTAtΣt1t1AtΣt1t1AtT+Qt)\boldsymbol{\Sigma}' = \begin{pmatrix} \boldsymbol{\Sigma}_{t-1|t-1} & \boldsymbol{\Sigma}_{t-1|t-1}\mathbf{A}_t^T \\[6pt] \mathbf{A}_t\boldsymbol{\Sigma}_{t-1|t-1} & \mathbf{A}_t\boldsymbol{\Sigma}_{t-1|t-1}\mathbf{A}_t^T + \mathbf{Q}_t \end{pmatrix}

Finally note that to marginalize in a joint distribution one simply drops the irrelevant variables (the ones to marginalize) from the mean vector and covariance matrix, in our case we marginalizing with respect to zt1\mathbf{z}_{t-1} we remain with

p(zty1:t1,u1:t)=N(ztμtt1,Σtt1)μtt1=Atμt1t1+BtutΣtt1=AtΣt1t1AtT+Qt\begin{align} p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) &= \mathcal{N}(\mathbf{z}_t\mid \mathbf{\mu}_{t|t-1},\mathbf{\Sigma}_{t|t-1})\\ \mathbf{\mu}_{t|t-1} &= \mathbf{A}_t\,\boldsymbol{\mu}_{t-1|t-1} + \mathbf{B}_t\,\mathbf{u}_t \\ \mathbf{\Sigma}_{t|t-1} &= \mathbf{A}_t\boldsymbol{\Sigma}_{t-1|t-1}\mathbf{A}_t^T + \mathbf{Q}_t \end{align}

Note that if no new measurements are used in the update step then the covariance grows with time.

Corrector (Update) step

The corrector step compute the posterior by using the likelihood (measurement model) and the prior from the predict step.

p(zty1:t,u1:t)=p(ytzt,ut)  p(zty1:t1,u1:t)=N ⁣(ytCtzt+Dtut,  Rt)N(ztμtt1,Σtt1)p(\mathbf{z}_t \mid \mathbf{y}_{1:t}, \mathbf{u}_{1:t}) = p(\mathbf{y}_t \mid \mathbf{z}_t, \mathbf{u}_t)\; p(\mathbf{z}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) \\ = \mathcal{N}\!\left(\mathbf{y}_t \mid \mathbf{C}_t \mathbf{z}_t + \mathbf{D}_t \mathbf{u}_t,\; \mathbf{R}_t\right)\mathcal{N}(\mathbf{z}_t\mid \mathbf{\mu}_{t|t-1},\mathbf{\Sigma}_{t|t-1})

In order to solve this we will build the joint distribution p(zt,yty1:t1,u1:t)p(\mathbf{z}_t,\mathbf{y}_t \mid \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) and then take the conditional distribution inside the joint Gaussian p(ztyt,y1:t1,u1:t)p(\mathbf{z}_t\mid \mathbf{y}_t, \mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}). First, we define the joint Gaussian

(ztyt)N(μ,Σ)\begin{pmatrix} z_t \\ y_t \end{pmatrix} \sim \mathcal{N}(\mu'',\Sigma'')

and compute the joint mean

μ=(E[zt]E[yt])=(μtt1Ctμtt1+Dtut)\boldsymbol{\mu}'' = \begin{pmatrix} \mathbb{E}[\mathbf{z}_{t}] \\ \mathbb{E}[\mathbf{y}_{t}] \end{pmatrix} = \begin{pmatrix} \boldsymbol{\mu}_{t|t-1} \\ \mathbf{C}_t\,\boldsymbol{\mu}_{t|t-1} + \mathbf{D}_t\,\mathbf{u}_t \end{pmatrix}

and each joint covariance

Cov(zt,zt)=Σtt1Cov(yt,yt)=Cov ⁣(Ctzt+δt)=CtΣtt1CtT+RtCov(zt,yt)=Cov ⁣(zt,Ctzt+δt)=Σtt1CtTCov(yt,zt)=Cov ⁣(Ctzt+δt,zt)=CtΣtt1\begin{align} \mathrm{Cov}(\mathbf{z}_{t},\mathbf{z}_{t}) &= \boldsymbol{\Sigma}_{t|t-1} \\[6pt] \mathrm{Cov}(\mathbf{y}_{t},\mathbf{y}_{t}) &= \mathrm{Cov}\!\left(\mathbf{C}_t \mathbf{z}_t+\boldsymbol{\delta}_t\right) = \mathbf{C}_t \boldsymbol{\Sigma}_{t|t-1}\mathbf{C}_t^T + \mathbf{R}_t \\[6pt] \mathrm{Cov}(\mathbf{z}_{t},\mathbf{y}_{t}) &= \mathrm{Cov}\!\left(\mathbf{z}_{t},\mathbf{C}_t \mathbf{z}_t+\boldsymbol{\delta}_t\right) = \boldsymbol{\Sigma}_{t|t-1} \mathbf{C}_t^T \\[6pt] \mathrm{Cov}(\mathbf{y}_{t}, \mathbf{z}_{t}) &= \mathrm{Cov}\!\left(\mathbf{C}_t \mathbf{z}_t+\boldsymbol{\delta}_t, \mathbf{z}_{t}\right) = \mathbf{C}_t \boldsymbol{\Sigma}_{t|t-1} \\[6pt] \end{align}

so that the covariance matrix is

Σ=(Σtt1Σtt1CtTCtΣtt1St)\boldsymbol{\Sigma}'' = \begin{pmatrix} \boldsymbol{\Sigma}_{t|t-1} & \boldsymbol{\Sigma}_{t|t-1}\,\mathbf{C}_t^{T} \\[10pt] \mathbf{C}_t\,\boldsymbol{\Sigma}_{t|t-1} & S_t \end{pmatrix}

where St=CtΣtt1CtT+Rt\mathbf{S}_t = \mathbf{C}_t\,\boldsymbol{\Sigma}_{t|t-1}\,\mathbf{C}_t^{T}+\mathbf{R}_t is called the innovation covariance. Then the conditional distribution of a general joint Gaussian is 3

p(x1x2)=N ⁣(x1μ12c,Σ12c)=N ⁣(x1|μ1+Σ12Σ221(x2μ2),  Σ11Σ12Σ221Σ21)p(x_1 \mid x_2) = \mathcal{N}\!\left(x_1 \mid \mu_{1|2}^{c},\, \Sigma_{1|2}^{c}\right) = \mathcal{N}\!\left(x_1 \,\middle|\, \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2),\; \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} \right)

We start by substituting terms in the mean to get the mean of the posterior

μtt=μtt1+Σtt1CtTSt1(yt(Ctμtt1+Dtut))=μtt1+Kt(yt(Ctμtt1+Dtut))\boldsymbol{\mu}_{t|t} = \boldsymbol{\mu}_{t|t-1} + \boldsymbol{\Sigma}_{t|t-1} \mathbf{C}_t^{T} \mathbf{S}_t^{-1} \big( \mathbf{y}_t -( \mathbf{C}_t \boldsymbol{\mu}_{t|t-1} + \mathbf{D}_t\,\mathbf{u}_t) \big) = \boldsymbol{\mu}_{t|t-1} + \mathbf{K}_t \big( \mathbf{y}_t -( \mathbf{C}_t \boldsymbol{\mu}_{t|t-1} + \mathbf{D}_t\,\mathbf{u}_t) \big)

where Kt\mathbf{K}_t is called the Kalman gain matrix

Kt=Σtt1CtTSt1=Σtt1CtT(CtΣtt1CtT+Rt)1\mathbf{K}_t = \boldsymbol{\Sigma}_{t|t-1} \mathbf{C}_t^{T} \mathbf{S}_t^{-1}= \boldsymbol{\Sigma}_{t|t-1} \mathbf{C}_t^{T} \left( \mathbf{C}_t \boldsymbol{\Sigma}_{t|t-1} \mathbf{C}_t^{T} + \mathbf{R}_t \right)^{-1}

then, we proceed in the same way but with the covariance

Σtt=Σtt1Σtt1CtT(CtΣtt1CtT+Rt)1CtΣtt1=Σtt1KtCtΣtt1\boldsymbol{\Sigma}_{t|t} = \boldsymbol{\Sigma}_{t|t-1} - \boldsymbol{\Sigma}_{t|t-1}\mathbf{C}_t^T \big( \mathbf{C}_t \boldsymbol{\Sigma}_{t|t-1} \mathbf{C}_t^{T} + \mathbf{R}_t \big)^{-1} \mathbf{C}_t\boldsymbol{\Sigma}_{t|t-1} = \boldsymbol{\Sigma}_{t|t-1} - \mathbf{K}_t \mathbf{C}_t \boldsymbol{\Sigma}_{t|t-1}

so that we have the mean and covariance of the posterior

p(zty1:t,u1:t)=N(ztμtt,Σtt)p(\mathbf{z}_t \mid \mathbf{y}_{1:t}, \mathbf{u}_{1:t}) = \mathcal{N}(\mathbf{z}_t \mid\boldsymbol{\mu}_{t|t}, \boldsymbol{\Sigma}_{t|t})

Footnotes

  1. Fraccaro, M., Kamronn, S., Paquet, U., & Winther, O. (2017). A disentangled recognition and nonlinear dynamics model for unsupervised learning. Advances in neural information processing systems, 30.

  2. Murphy, K. P. (2023). Probabilistic machine learning: Advanced topics. MIT press.

  3. Masnadi-Shirazi, H., Masnadi-Shirazi, A., & Dastgheib, M. A. (2019). A step by step mathematical derivation and tutorial on kalman filters. arXiv preprint arXiv:1910.03558.

Comments

No comments yet.