In the previous lesson, you learned how to use a Sigma-Point Kalman Filter for parameter estimation. In this lesson, you will learn how to use an Extended Kalman Filter for parameter estimation instead. As always, we are going to follow the pattern of the six steps of Gaussian sequential probabilistic inference. Remember the first step is to derive the equations for the parameter prediction time update. Because the parameter dynamics, you learned about in the previous lesson, have a linear form, the equation for performing the parameter prediction for EKF turns out to be identical to the one for SPKF. That is, the predicted parameter values at this iteration are equal to the estimated parameter values at the end of the previous iteration. Then we need to compute the error covariance for these parameter predictions. Again, because of the linearity of the parameter dynamics equation, we achieve exactly the same results that we found for the Sigma-Point Kalman Filter, that is, the uncertainty of the prediction is equal to the uncertainty of the previous estimate plus the covariance of the process noise that is deriving the change in the parameter values. The next thing we need to do is to predict the value that we will measure. The generic step says that this is done by evaluating the expected value of the measurement equation given all previous inputs. Remember that one of the assumptions or the Extended Kalman Filter is that we can approximate the expected value of a non-linear function of a random variable as being approximately equal to the non-linear function evaluated at the expected value of that random variable. That's what we do here. We predict the measurement using the output equation by substituting the predicted parameter vector and the mean noise vector into that equation. So the EKF is assuming that propagating the predicted parameter vector and the mean estimation error is the best approximation to predicting the output. The next step is to compute the estimator gain matrix. Remember that to compute this matrix, we first must compute the covariance of the prediction error, of the output that is, as well as the cross covariance between the error and predicted parameter values in the output prediction error. We begin by finding an expression for the output prediction error. We write that this error is equal to the true output equation evaluated at exactly correct inputs, minus the same equation evaluated with the approximated inputs. When we did this kind of work with the Extended Kalman Filter for state estimation, we used a Taylor series expansion and we're going to do the same thing here. We apply the expansion to the first term on the right hand side of the equality and write that the true measurement is approximately equal to the approximated measurement plus the derivative of the measurement equation with respect to parameters multiplying the parameter prediction error plus the derivative of the output equation with respect to noise multiplying the random part of the noise. My first derivative in this equation, we denote as C hat theta and the second derivative in the equation we denote as D hat theta. You might remember that when we looked at state estimation, we had a C hat and D hat. Here, I included the theta anotation to remind us that we're talking about parameter estimates and not state estimates. Without going into all of the details, we can use the results from the previous slide to compute the two covariance matrices that we need and these matrices are listed here and then we combine these terms to get the estimator gain which is also listed here. Before continuing, I'd like to remind you about a distinction between partial and total derivatives, and this is important when we're computing C hat theta and D hat theta. If we use the chain rule of total differentials, we can write that the total derivative of the output equation with respect to the parameters is equal to the partial derivative with respect to this state multiplying the total derivative of the state with respect to parameters plus the partial derivative of the output equation with respect to the input multiplying the total derivative of the input with respect to parameters plus the partial derivative of the output equation with respect to the parameters themselves multiplying the total derivative of the parameters with respect to themselves plus the partial derivative of the output equation with respect to the measurement error multiplying the total derivative of a measurement error with respect to the parameters. We assume that the deterministic input to the system U is not directly a function of the parameter vector and nor is the measurement noise, so we write those total derivatives as zero. Our final result here says that the total derivative that we seek is equal to a partial derivative of the output equation with respect to parameters which we might expect to be there but there's also a partial derivative with respect to this state, multiplying the total derivative of the state with respect to parameters. So, this second term is new and it requires that we think a little bit harder. How do I implement that? I need to find the total derivative of state with respect to parameters. So to do so, we write the overall total differential expansion using the state equation of the model and the details are similar to what I presented on a similar slide is using the state equation and so I show in the final result here. This equation says that the total derivative of this state at this point in time is equal to its partial derivative plus the partial derivative with respect to the previous state multiplying the total derivative of the state with respect to the parameters at the previous point in time. This is a recursive relationship. It says the new quantity equals a function of the old quantity plus something new. Whenever we have a recursive relationship, we need to decide how to initialize that recursion. Here we don't know, so we simply say let's initialize the recursion to zero unless we have some side information that might give us a better estimate for its value. To calculate C hat theta for any particular model structure, we need methods to calculate all the partial derivatives for that model. That was the most difficult part of the EKF derivation and now we proceed to look at the final two steps. First, we compute the parameter estimate as being equal to the predicted parameter vector plus the filter gain multiplying the output innovation and that's exactly the same as it was for the Sigm-Point Kalman Filter. The final step computes the covariance matrix of the estimate and this covariance matrix is equal to the covariance matrix of the prediction error minus the gain, multiplying covariance of the innovation, multiplying the gain vector transpose. Again, exactly the same relationship that we saw for the Sigma-Point Kalman Filter. In summary, you've now learned how to use the EKF for parameter estimation when the state of the system is known. Unlike using EKF for state estimation, the distinction between total and partial derivatives when computing the EKF update matrices now matters very much. You learned a recursive formulation that allows computing the required total derivatives and you learned that we can initialize this recursion to zero unless some side information happens to be available. We also initialize the parameter estimate with the best information we have regarding the true parameter values and we initialize the parameter covariance matrix for the best information that you might have regarding the uncertainties of those values. Once again, I have summarize this method for you and the appendix to this lesson and the notes. I will not spend time in this lecture of video summarizing all of the steps but I do include the appendix for your reference. Instead, we're going to proceed by looking at two different ways to estimate these states and parameters of a system at the same time.