In the previous lesson, you learned that it's possible to write the state of charge equation for a battery model in a way that total capacity appears linearly. And that gives us some hope of being able to use that relationship to estimate the total capacity. You also learned that we expect challenges when applying ordinary least squares to the problem. But we expect a total least squares method to work better. Both ordinary least squares and total least squares seek to find an estimate of capacity that we call Q hat. Such that y is approximately equal to Q hat times x using vectors of measured data points. These data points occur in pairs and we put the x components in a vector labeled bold face x. And we put the y components in a vector labeled bold face y. The ith element in both vectors constitute the data pair. And that data pair corresponds to data collected from a particular cell over a specific time interval i. So x subscript i is the estimated change in state of charge over that time interval. And y subscript i is the net accumulated ampere hours of charge passing through the cell during the same time interval. So specifically in terms of equations, we write the x subscript i is equal to the ending state-of-charge minus the beginning state-of-charge for that interval of time. We also write the y subscript i, is equal to the negative sampling period times the summation of the ampere seconds passing through the cell during that period. Remembering that the sampling interval converts this into hours and so the overall results is in ampere hours. Even though I have warned you that the ordinary least squares approach will produce inaccurate and biased estimates of total capacity. We're actually going to spend a little time in this lesson deriving the ordinary least squares result. We're going to use this result as a benchmark against which to compare the better methods, the total least squares methods. So that you can see that the ordinary least squares methods fail when the total least squares method succeed. I also like to present this method first because its derivation is simpler, but is similar to the derivations that we will do for the total least squares methods. Remember that ordinary least squares assumes that there are no errors in the x points in every data pair, but there are errors in the y points in every data pair. The figure to the right illustrates this by showing error bounds or confidence intervals in the y direction. But no such bounds in the x direction, because that coordinate is assumed to be known perfectly. The error bounds in the y direction are drawn proportional to the standard deviation of the errors on each one of these measurements. A large standard deviation means that we have a large degree of uncertainty in that data point. And a small standard deviation means that we have much smaller uncertainty on that data point. So when we're estimating total capacity, we should put more emphasis on those data points with small uncertainty and less emphasis on data points have large uncertainty. We assume that the errors on the data points have an average value of zero. And so we are assuming that the measurements are not biased in any particular direction. We also tend to talk about variances of measurement error instead of standard deviations. But as you know the variances are simply the squares of the respective standard deviations, so that should pose no challenges for you. And we will not assume that the measurements have identical variance, in fact, every measurement might have its own independent variance. Ordinarily squares methods attempt to find the estimate of capacity that we will label Q hat, the true capacity being of course, Q. And we want this estimate to minimize the summation of squared errors, essentially. But we also want to possibly have the ability to weight the errors, as you're going to see. Where the errors are the differences between the measured data point and the final line of the total capacity of y equals Q hat times x. So instead of treating every data point identically as I commented on the previous slide, we're actually going to weight the data by generalizing the idea of least squares slightly. We're going to minimize the summation of weighted squared errors instead of just a summation squared errors without weighting. And as weighting is going to take in to account the uncertainties of every specific measurements. Mathematically, we say that we desire to find Q hat that minimizes the weighted least squares, or the WLS cost function. That cost function is written as the summation of weighted square differences between the measured data point, lowercase y and the value, uppercase Y. Where upper case Y is the mapping of the measured data point onto the line described by Y equals Q hat times X. The weighting is equal to 1 divided by the variance of each data point. So if the variance is large than 1 divided by the variance is small and this cost function will place relatively little effort on trying to get the line to agree with this particular data point. If the variance is small for some particular data point then 1 divided by the variance is large. And the cost function is going to place much more effort on trying to get the line to agree with that particular data point. Since we've already decided that mapping of lowercase Y data points on to the line is achieved by setting uppercase Y = Q hat times letter X. We can rewrite the summation as shown on the right-hand side of this equation. And if you look at this, the cost function is now written only in terms of quantities. Such as, measured Y and measured X, and the variances of the Y measurements, and the single unknown quantity that we intend to find, which is, of course, Q hat. So we're ready to do something with this cost function. Before we do that though, I just comment that it turns out that if these measurement noises happen to have Gaussian or normal distribution. Then the cost function has a chi squared distribution, which is why I give it the chi squared symbol. And I'm going to use this particular symbol for all of the cost functions that we look at in this course, whether or not we assume that the noises have a Gaussian distribution. So here it's used only as a notation, but it actually has meaning if the noises are Gaussian. Starting with the cost function of the previous slide, we could take a number of different approaches to finding the minimum or the solution. One that works very well for us is a very standard technique of simply differentiating the cost function with respect to this unknown quantity, Q hat. And then solving for Q hat by setting the partial derivative of the equation to zero. Here, I show you the result of taking this partial derivative, which is itself a summation that we will set to zero. Starting with the first equation line, we can break up the numerator component into two distinct parts which then allows us to write as two distinct summations. We bring one summation to the right side of the equation, and the other summation to the left side of the equation. And we notice that once we break into the two summations, the variable Q hat is constant value, it's an unknown value, but a constant value in its summation. So it factors outside of the summation on the left. So we have here that Q hat times the summation of the weighted X squared data points is equal to the weighted summation of the X multiplied by Y data points. At this point, we solve for the unknown optimal Q hat, simply by dividing the summation on the right side of the equation by the summation on the left side of the equation. And this gives us a fully closed form solution for the ordinary least squares problem. In summary, we have now defined and solved the weighted ordinary least squares problem for estimating battery cell total capacity. We created a cost function to optimize, which was equal to the weighted differences between the measured Y data points and their mapping on the total capacity relationship. The solution was the quotient of two different summations. It's convenient to keep a variable, c1 equal to one of the running summations, and another variable c2 equal to the other running summation. Then we can write the estimate of total capacity, Q hat is equal to the variable c2 divided by the variable c1. And that concludes our discussion and we start preliminary discussion of ordinary least squares. And the next lesson, we're going to look at some ways to implement this ordinary least squares method very efficiently. They will also lead to ideas and how we might implement some of the more advancements and sufficiently as well.