Time Series Analysis: (6) Maximum Likelihood Estimation
In a previous post, we introduced Autoregressive Integrated Moving Average (ARIMA) and Exponential Smoothing, and we mentioned that some ARIMA models can be written in the format of Exponential Smoothing. In another post, we learned State Space Models and Kalman Filter.
In the current post, we will learn how to use State Space model and Kalman Filter for parameter estimation of AR models based on Maximum Likelihood Estimation
(MLE). That is, we
want to select that parameters that make the observed data the most likely. For example, when we observe independent and identically distibuted (iid) data y1,…,yn∼p(y;θ), the MLE of the parameter vector θ is:
L(θ)=n∏i−1f(yi|θ)
Now we chose the value of θ that maximizes the likelihood function, ˆθ=argmaxθL(θ).
A cool property of argmax is that since log is a monotone function, the argmax of a function is the same as the argmax of the log of the function! That’s nice because logs make the math simpler. Instead of using likelihood, you should instead use log likelihood.
LL(θ)=logn∏i=1f(yi|θ)=n∑i=1logf(yi|θ)
However, if the data are not independent such as in AR(1) model the maximize the log-likelihood is hard to write down.
An AR(1) Model MLE
For AR(1) model, we can describe it as: yt=ϕyt−1+wt
where wt∼N(0,τ2) and we assume that E[yt]=0. What is the joint distribution of the time series?
If we assume that the process is 2nd-order stationary then for the marginal variances, we have Var(yt)=ϕ2Var(yt−1)+τ2
The stationarity assumption implies that Var(yt)=τ21−ϕ2, assuming that |ϕ|<0. Furthermore, we can show that
Cov(yt,yt−1)=Cov(ϕyt−1,yt−1)=ϕVar(yt−1)=ϕτ21−ϕ2
Because of the sequential dependence of yts on each other, we have Cov(yt,yt−j)=ϕ|j|τ21−ϕ2
The covariance matrix can be something like: [τ21−ϕ2ϕτ21−ϕ2…ϕ|n−1|τ21−ϕ2ϕ|n|τ21−ϕ2ϕτ21−ϕ2τ21−ϕ2ϕτ21−ϕ2…ϕ|n−1|τ21−ϕ2⋮ϕτ21−ϕ2⋱ϕτ21−ϕ2⋮ϕ|n−1|τ21−ϕ2…ϕτ21−ϕ2τ21−ϕ2ϕτ21−ϕ2ϕ|n|τ21−ϕ2ϕ|n−1|τ21−ϕ2…ϕτ21−ϕ2τ21−ϕ2]
As n increase, some challenges arise:
-
the covariance matrix quickly grows in size, making the computations more cumbersome, especially because some form of matrix decomposition must occur.
-
we are taking larger and larger powers of ϕ, which can quickly lead to numerical instability and unfortunately cannot be solved by taking logs.
-
The formulation above ignores the sequential structure of the AR(1) model, which could be used to simplify the computations.
To addresses both of these problems, the timeseriesbook then introduced how the Kalman filter can provide a computationally efficient way to evaluate this complex likelihood.
Maximum Likelihood with the Kalman Filter
The basic idea is to re-formulate the time series AR(1) model as a state space model, and then use the Kalman filter to compute the log-Likelihood of the obseverd data for a given set of parameters.
The general formulation of the state space model is: xt=Θxt−1+Wtyt=Atxt+Vt where Vt∼n(0,S) and Wt∼n(0,R).
The joint density of the observations, p(y1,y2,…,yn)=p(y1)p(y2,…,yn|y1)=p(y1)p(y2|y1)p(y3,…,yn|y1,y2)⋮=p(y1)p(y2|y1)p(y3|y1,y2)…p(yn|y1,…,yn−1)
We Initially need to compute p(y1), p(y1)=∫p(y1,x1)dx1=∫p(y1|x1)⏟density for theobservation equationp(x1)dx1
The density p(x1) is N(x01,P01), where x01=Θx00P01=ΘP00Θ′+R
Together, we get p(y1)=N(Ax01,AP01A′+S)
In general, we will have p(yt|y1,…,yt−1)=N(Axt−1t,APt−1tA′+S)
Then we can use standard non-linear maximization routines like Newton’s method or quasi-Newton approaches for MLE. The timeseriesbook then introduced an example of AR(2) model(see here)