0:00

, The Chain rule is so important, it's worth thinking through a proof of its

validity. You might be tempted to think that you can get away with just

cancelling. What I mean is you might be tempted to think that something like this

works. let's say you wanted to differentiate g of f of x. You might set f

of x equal to y. And then, you might think, well, what you're really trying to

calculate is the derivative of g of y and okay. And then, you might say, well, the

derivative of g of y, that will be the derivative of g with respect to y times

the derivative of y with respect to x, and that's really the Chain rule. I mean, this

first thing is the derivative of g and this other thing is the derivative of f

just like you'd expect, and then you're trying to say that you can just cancel,

alright? You're not allowed to just cancel. I mean, the upshot here is just

that dy/dx is not a fraction, alright? You can't justify this equality by just

canceling because these objects, the way you're supposedly doing the canceling,

they're not fractions. We need a more delicate argument than that. One way to go

is to give a slightly different definition of derivative. Well, here's a slightly

different way of packaging up the derivative. The function f is

differentiable at a point a, provided there's some number, which I"m

suggestively calling f prime of a, if the derivative of f at a, so that the limit of

this error function is equal to 0. And what's this error function? Well, it's

measuring how far my approximation is that I'd get using the derivative from the

actual functions output if I plug in an input value near a. In some ways, that's

actually a nicer definition of derivative, since it really conveys that the

derivative provides a way to approximate output values of functions. In any case,

now, let's take this new definition of derivative and try to prove the Chain

rule. Try to approximate g of f of x plus h. Try to discover the Chain rules. So, I

want to be able to express this in terms of the derivative s of g and f, at least,

approximately, and then control the error. Well, I can do that for f because I'm

assuming that f is deferential about the point x. So, this is g of, instead of f of

x plus h, f of x plus the derivative of f at x times h plus an error term, we should

be calling error of f of h times h. I'm going to play the same game with g. This

is g of f of x plus a small quantity. And if I assume that g is differentiable at

the point f of x, then this is g of f of x plus the derivative of g at f of x times

how much I wiggle by, which, in this case, is f prime of x h plus that error term

plus an error term for g, which is the error term for g. And I have to put in how

much I wiggled by, which, in this case, is f prime of xh plus the error term for f at

h times h times that same quantity, f prime of xh plus the error term for f at h

times h. Alright, so that's exactly equal to g of f of x plus h and I'm including

all of the error terms. Now, we can expand out a bit. So, this is g of f of x plus,

you can multiply these two terms together, g prime of f of x times f prime of xh.

That's looking really good, because that's what the Chain rule is, right? It's

supposed to give me this as the derivative of g composed with f. Plus, I've got a ton

of error terms now. All those error terms have an h, so I'm going to collect all the

h's at the end. The first error term is g prime of f of x, this term here, times the

error of f at h. The next ones, plus the error of g, at this complicated quantity,

I was going to abbreviate hyphen, times f prime of x times h, I'm collecting all of

these h's the end, plus the error term for g, at that complicated quantity, times the

error for f at h, and all of that is times h. Alright. Now, this is almost giving me

the derivative of the composite function provided that I can control the size of

this error term, right? What I need to show now is that the limit as h approches

0 of this error term is really 0, and the error term, right, it's the part before th

e times h, and it's g prime of f of x times the error term for f at h plus the

error term for g times f prime of x, plus the error term for g times the error term

for f at h. Now, why do I know that, that limit is equal to 0? Well, I can do it in

pieces, right? It's the limit of the sum, so it's the sum of the limits. and I know

that this first term is 0 because it's got an error f h term in it, and because f is

differentiable, the error term goes to 0. I likewise know the same for this,

alright? This is, it also got an error f of h term in it. The most mysterious term

is this. But if you think a little bit more about it, the error of g at this

hyphen thing, which I'm abbreviating this whole thing here, also goes to 0 as h goes

to 0. And that's another thing to know that the limit as h goes to 0 of this

quantity is 0 which is then enough to say that g of f of x plus h equals this

quantity, actually implies that this is the derivative. So, here's what we've

actually shown. Suppose that f is differentiable at a point a and g is

differentiable at the point f of a, then the composite function, g composed with f,

is differentiable at a, with the derivative of g of f at the point a, equal

to the derivative of g at f of a times the derivative of f at a.