In a previous post, we introduced Autoregressive Integrated Moving Average (ARIMA) and Exponential Smoothing, and we mentioned that some ARIMA models can be written in the format of Exponential Smoothing. In another post, we learned State Space Models and Kalman Filter.

In the current post, we will learn how to use State Space model and Kalman Filter for parameter estimation of AR models based on Maximum Likelihood Estimation (MLE). That is, we want to select that parameters that make the observed data the most likely. For example, when we observe independent and identically distibuted (iid) data y1,,ynp(y;θ), the MLE of the parameter vector θ is:

L(θ)=ni1f(yi|θ)

Now we chose the value of θ that maximizes the likelihood function, ˆθ=argmaxθL(θ).

A cool property of argmax is that since log is a monotone function, the argmax of a function is the same as the argmax of the log of the function! That’s nice because logs make the math simpler. Instead of using likelihood, you should instead use log likelihood.

LL(θ)=logni=1f(yi|θ)=ni=1logf(yi|θ)

However, if the data are not independent such as in AR(1) model the maximize the log-likelihood is hard to write down.

An AR(1) Model MLE

For AR(1) model, we can describe it as: yt=ϕyt1+wt

where wtN(0,τ2) and we assume that E[yt]=0. What is the joint distribution of the time series?

If we assume that the process is 2nd-order stationary then for the marginal variances, we have Var(yt)=ϕ2Var(yt1)+τ2

The stationarity assumption implies that Var(yt)=τ21ϕ2, assuming that |ϕ|<0. Furthermore, we can show that

Cov(yt,yt1)=Cov(ϕyt1,yt1)=ϕVar(yt1)=ϕτ21ϕ2

Because of the sequential dependence of yts on each other, we have Cov(yt,ytj)=ϕ|j|τ21ϕ2

The covariance matrix can be something like: [τ21ϕ2ϕτ21ϕ2ϕ|n1|τ21ϕ2ϕ|n|τ21ϕ2ϕτ21ϕ2τ21ϕ2ϕτ21ϕ2ϕ|n1|τ21ϕ2ϕτ21ϕ2ϕτ21ϕ2ϕ|n1|τ21ϕ2ϕτ21ϕ2τ21ϕ2ϕτ21ϕ2ϕ|n|τ21ϕ2ϕ|n1|τ21ϕ2ϕτ21ϕ2τ21ϕ2]

As n increase, some challenges arise:

  1. the covariance matrix quickly grows in size, making the computations more cumbersome, especially because some form of matrix decomposition must occur.

  2. we are taking larger and larger powers of ϕ, which can quickly lead to numerical instability and unfortunately cannot be solved by taking logs.

  3. The formulation above ignores the sequential structure of the AR(1) model, which could be used to simplify the computations.

To addresses both of these problems, the timeseriesbook then introduced how the Kalman filter can provide a computationally efficient way to evaluate this complex likelihood.

Maximum Likelihood with the Kalman Filter

The basic idea is to re-formulate the time series AR(1) model as a state space model, and then use the Kalman filter to compute the log-Likelihood of the obseverd data for a given set of parameters.

The general formulation of the state space model is: xt=Θxt1+Wtyt=Atxt+Vt where Vtn(0,S) and Wtn(0,R).

The joint density of the observations, p(y1,y2,,yn)=p(y1)p(y2,,yn|y1)=p(y1)p(y2|y1)p(y3,,yn|y1,y2)=p(y1)p(y2|y1)p(y3|y1,y2)p(yn|y1,,yn1)

We Initially need to compute p(y1), p(y1)=p(y1,x1)dx1=p(y1|x1)density for theobservation equationp(x1)dx1

The density p(x1) is N(x01,P01), where x01=Θx00P01=ΘP00Θ+R

Together, we get p(y1)=N(Ax01,AP01A+S)

In general, we will have p(yt|y1,,yt1)=N(Axt1t,APt1tA+S)

Then we can use standard non-linear maximization routines like Newton’s method or quasi-Newton approaches for MLE. The timeseriesbook then introduced an example of AR(2) model(see here)