0:00

[MUSIC]

Â So, basically, the idea behind the model we're going to study now, and

Â the model is called generative adversarial networks.

Â The idea is that we want to train a model specifically to tell us whether

Â a generated image is good enough or not.

Â So, it wants to use the kind of pre-trained plus mean

Â squared error metric, it's just a specific model that has only one task.

Â It says if an image is good enough or not.

Â Now, basically this makes two networks now.

Â We have the first network, which is called a generator, it takes some kind of scene.

Â Maybe a random noise or some conditional what kind of face or

Â cheer it has to generate and it has to output some kind the some image.

Â Of course, at the very beginning those images won't be good,

Â won't be a [INAUDIBLE] really once.

Â The second part, this kind of specially trained metric are your distinct nature.

Â And it's called so because it discriminates between our real kind of

Â reference images from a data set, and the images the generator generates.

Â At first this problem is going to be very easy, because generator is generating

Â rubbish, because it's just random initialization usually.

Â And your [INAUDIBLE] discriminator will very easy tell the difference between

Â noise and well, nonsensical images from generator and actual images.

Â But this task is not solved for the sake of itself.

Â You train discriminator in order to train generator.

Â Now you want to tune your generative network, the guy on the left,

Â in a way that tries to fool the discriminator.

Â So if the discriminator tries to say whether you image is real or not,

Â you then try to adjust the image, adjust the generative mode.

Â So that this image becomes more real in the eye of the discriminator.

Â Let's see how it looks in terms of scheme, so that it gets clear perhaps.

Â First network is generator.

Â It's the thing that takes parameters, perhaps a tree orientation or

Â face features, and

Â some random noise to make sure that it generates different stuff each time.

Â And then tries to produce some kind of generated fake image.

Â And this model is basically your usual, kind of,

Â the data that you tried to train with MSCA.

Â Except it won't be trained with mean root square error anymore.

Â Second part is called discriminator, it's going to take image,

Â either a real image or an image from the previous model.

Â And it's going to output just one sigma, well,

Â kind of probability estimation of the final error.

Â This is going to be the probability of this image being fake.

Â Or it could be the probability of it being real.

Â The idea here is that this model is a discriminator, it can train for

Â a simple classification problem by using positive examples from data sets,

Â like non-fake images.

Â And negative fakes from what generator [INAUDIBLE] generates.

Â But then, here's the catch.

Â This discriminator's a new network.

Â So it's usually differentiable.

Â It was viewed differentially so it can train viable propagation.

Â Now it can trick the, it can take this, permuse your fake or permuse of being

Â real, it can repeat the gradient of the probability of being real over the pixels

Â of an image, like vector propagate through the entire discriminator.

Â But this image is, if it's generated, is just an out [INAUDIBLE] generator,

Â meaning the whole generator is entirely differentiable.

Â So you propĂ gate the gradient through the discriminator, back to generator,

Â through this image.

Â And, what you get, is you get, kind of, gradients,

Â that tell the generator how to fool discriminator.

Â Now you tune this generator to [INAUDIBLE] then it produces better images.

Â But then the [INAUDIBLE] is no longer able to distinguish between real [INAUDIBLE].

Â It was easy when the images were just basically white noise but

Â once they are not it becomes slightly harder.

Â So lets train Discriminator again.

Â This time again I feeded the kind of fake examples from the new better Generator.

Â Again so this process makes Discriminator stronger.

Â So now let's [INAUDIBLE] Generator.

Â Let's propagate the gradient through the new Discriminator back into Generator and

Â through that again.

Â Now, for this image, it's weaker, so let's do an [INAUDIBLE] and so the loop goes

Â until, well, essentially Generator generates three [INAUDIBLE] good images.

Â And Discriminator is more or less kind of from image critic.

Â So, if you tried to apply and formulize this logic in a math formula,

Â discriminator is solving classification problem.

Â So he tries to optimize [INAUDIBLE], so what you have here is the variable form

Â of the logarithm of probability of real data being real,

Â and the probability of fake data being fake.

Â Which is what you usually optimize.

Â But for the generator, you do this particular thing.

Â You have the logarithm of disciminator, thinking that you are real.

Â And you compute the gradients over the parameter of generator, and

Â you tune the generator this way.

Â And those list functions, D, of course they can contradict one another.

Â You can see that the generator's objective is essentially contradicting

Â the discriminator's goal to try to make discriminator, to classify,

Â to discriminate fake images as fake.

Â This [INAUDIBLE] contradicts with one of the [INAUDIBLE] objectives.

Â You have this algorithm of probability of being fake given image from [INAUDIBLE].

Â [INAUDIBLE] tries to minimize it while generator tries to assign larger

Â probability.

Â This actually means that you'll have to train those two networks simultaneously

Â and you'll have to train them to make a few alterations on each of them in this

Â training loop.

Â To make sure that the [INAUDIBLE] is useful and

Â the generator actually gets better full the [INAUDIBLE] from this [INAUDIBLE].

Â So the [INAUDIBLE] becomes the [INAUDIBLE].

Â You sample Of those noise, maybe some kind of task descriptions for

Â your what kind of face, what kind of chair you want to generate?

Â We sample the actual chairs, the reference images.

Â Then you spend a few generations training the discriminator.

Â You have actual images and you wanted to maximize

Â the probability of being real for those images.

Â So minimize the probability of being fake.

Â While minimizing this probability of maximizing fakeness for

Â the upload generator.

Â Then comes the second stage where you actually want to tune the generator so

Â that it fakes the discriminator.

Â Again, for one or more apples, you take this kind of task and maybe random noise.

Â Then you train your generator in a way that follows the gradients from

Â discriminator and tries to fake it.

Â And so the process go.

Â Now, if you're experienced enough with what kind of training curricula and

Â how you actually get deep learning models to train efficiently.

Â This particular fashion of trained algorithm should

Â probably [INAUDIBLE] because there is all those K, M, all those [INAUDIBLE].

Â Now this is your yield problem with generative models,

Â they are really unstable.

Â Because you generally have two networks that hate each other, and

Â you have to somehow accent profit from them hating each other.

Â And if one of them wins, basically this means that you have to start

Â the the entire process all over again.

Â For example, if discriminator wins, then it's kind of [INAUDIBLE] probability

Â estimate of [INAUDIBLE] of being fake or real.

Â It's already near 0 or near 1, which means that the gradients vanish.

Â Basically, the [INAUDIBLE] has very small variance near the activation of 1 or 0.

Â If you've now followed those screens with the generator, this won't get you anywhere

Â because the gradients are super small, negatable, in fact.

Â If generator wins, I mean if generator is constantly able to train faster than

Â discriminator, then you have an even worse situation.

Â It doesn't only stop training, but it starts learning the wrong things.

Â Because right now, the generator is fast enough to fool discriminator.

Â Discriminator is not able to give it clear gradients of how to improve.

Â Therefore you can expect your generator to learn basically nonsensical stuff diverge.

Â Basically you have to find some kind of equilibrium here where

Â the two models have more or less equal chances.

Â And ideally, mathematically, this whole process should terminate as a situation

Â where Generator wins but after a large number of steps.

Â So ideally generator should perfectly mimic the data distribution.

Â Should be indistinguishable.

Â That is discriminator should surrender.

Â But in real life, in most cases you wont see this thing happening.

Â You terminate.

Â I mean you get old faster than this happens So as the training loop iterates

Â your generator gets progressively better at well, generating stuff.

Â And this gives it more and more accurate data because it trains as well.

Â [INAUDIBLE] If you stop this process after some large amount of iterations

Â to see the final product you'll actually notice that the generated images here are.

Â Somewhat qualitatively different to the images that [INAUDIBLE] decoder generates

Â if you try to feed decoder with random noise.

Â Previously, [INAUDIBLE] decoder had to generate kind of average blind

Â images because this is how it could produce mean squared error.

Â Now, the generator won't be able to pull that trick.

Â The reason here is that if you generate a blurry image.

Â It may be good for mean squared error, but

Â a discriminator will easily differentiate between blurry and non-blurry images.

Â So if the generated images are, of course, non-perfect,

Â we can see some of them here having flaws, minor ones.

Â But they, of course, non-ideal, but they are non-ideal in different ways.

Â They are more or less, they are plausible.

Â Each image is in its way kind of trying to resemble the actual sample,

Â not trying to average over them.

Â And this is the core kind of advantage of generative adversarial networks.

Â They try to mimic a data set,

Â not to just try to learn probability distribution over it.

Â In fact, they do generate [INAUDIBLE] on the probability, but

Â instead of learning the distribution itself, it learns the sample,

Â which is kind of simpler in the case of images.

Â [MUSIC]

Â