0:00
[MUSIC]
Okay, so I mentioned that this time, you only have the brain scans, but
no classification problem in mind.
So you just got the data.
You don't have the particular cancer versus no cancer problem here.
Actually, you have no labels at all.
But what you want to do, you want to take the, wow,
the multitude of your brain scans.
We want to understood whether there is maybe some kind of structure to it.
Maybe there's some kind of anomaly that some of the patients share.
Or maybe, there's like one or
several groups that share some explainable traits.
It's kind of hard to do if you have 2D or
3D images that are poorly interpretable by humans.
But there are special methods like, for example, Stochastic Neighborhood Embedding
that can embed the larger dimensional data into the small dimensional size in order
to kind of visualize it for you, so that you can plot the 2D or 3D scatter.
Now, the only problem is that if you apply t-SNE into, if you apply it to, say,
100 by 100 pixel image, you could wait forever before it converges.
And it's kind of not going to work, because basically, well,
the [INAUDIBLE] complexity of t-SNE is just too large for this.
Now, what you can do instead is you can use the autoencoder as a preliminary step.
So you take the pixels and you can convert it into the hidden representation of where
there's only maybe 100 or maybe 250 pixels on the slide
hidden components that can then be embedded with t-SNE.
Now, this is a much easier problem from the computational perspective.
And you can use this to visualize even low-level data like images and sounds.
Now, if this t-Stochastic Neighborhood Embedding doesn't sound familiar,
it's one of the so-called many foldering methods.
There is a lot of math and practical application behind it.
But so far, let's just consider that they are one of the cool ways you can
take your encoded 100 dimensional or 256 dimensional vectors.
Map it into a super low dimensionality like a 2D point or a 3D point in a way
that your original images, if they resemble one another, will be mapped to
similar points, while different images will get more distant from one another.
There is a neat picture on the slide, so I encourage you to pause the, well,
presentation here and see if you can spot the symmetrical symantical coherence
between similar images.
But we'll cover more about the math of the embedding methods,
the manifold learning methods later when we cover for example.
So it more of less started out the compression and
the feats representation finding property of unsupervised learning.
Now, let's talk about generation data.
So previously, we used the encoder only to generate features.
Now, my construction in decoder is a part of the [INAUDIBLE] that was trained to
make the errors transform and took the higher level features and
it constructed an image that would correspond to those features.
So it would take, for example, a particular description of a face and
some high-level dense representation, and it would produce the face itself.
What it allows you to do is say that you have two images where the first
person has an, well, enormously large nose, for example.
And the second one has an enormously small, okay, not enormously, tiny nose.
Then you could find the latent vectors of those two images.
And if you average the out,
you have all chances to end up on an image that has an average-sized nose.
This is a two example, but in general,
this opens the domain of image morphing, so it is called.
You can take your image in this hidden representation and
find some vector that applies some, well, meaningful transformation to it,
like adding smile or removing smile if it's a face.
Or if it's a music, maybe adding some kind of, well, sound effect, or
maybe even converting one instrument into the other.
Now, here on the slide, you can see this trick applied to the 100 digits.
It's the means dataset.
You're probably familiar with it.
So if a train on altering quarter on missed,
you can expect the hidden representation of the altering
quarter to correspond to image classes and handwriting styles.
Well, it isn't remarkable, but it's quite a bit neural.
Even without being shown that there is 10 labels, 10 classes, the models
learn to map each hidden representation to a particular class label.
So this can be done to discover the classification problem for
you to then solve this supervised learning.
The idea here is that if you have a decoder that was trained on such dataset,
it was originally trained to convert this hidden representation back into the image.
So if you take some hidden representation, then you can use it to generate an image.
And if you alter it slightly, the image will be altered as well.
So if you start, for example, from 7, it's the upper-left corner,
and then you alter one feature by slight increments,
you'll get the kind of incremental change from 7 to 1.
And if you choose a different direction, it would be 7 to 0.
It's a small example, but
it kind of gives you the flavor of what can be done with alter.
Now, when applied to image, it's the same kind of vector,
the same incremental changes can lead you from a particular, say,
face of a male person into a corresponding female person with all other
features preserved to maximum possible state.
Now the idea here is that you first want to understand whether
such a direction exists.
So you want to find out whether there is maybe a bunch of male people who have
some some particular feature at a negative value and
female people have a positive value.
And if you take a male person and
apply the female transformation to it you'll get a corresponding female person.
Here on the image the same is applied for vector that kind of increases the age, so
this is one way you can do this.
Artificial intelligence predicts what's your future appearance going
to look like at say 80.
So there is a few neat demos using it, so here is a set of faces And
the corresponding rules are present using editing quarters to add or
remove smiles or eye glasses.
Now there's of course more than one way you can train in your letter generic
especially generic images.
There's a special class of models called Generative Adversarial Networks and those
are to train to generate And they kind of thought to be state of the art here.
So they produce images that are slightly sharper, slightly more plausible.
This so it is thought by the experts at this particular point
of time because image generation is hard to judge.
But the beauty of altering quarters is that they are able to solve so
many tasks and being able to generate images is only an artifact here.
They were not originally intended to do so.
So basically you can think of as a very specific mathematical model which has
a very specific loss function that grasps a very wide range of practical problems.
We'll elaborate on those image generation methods and
image morphing approaches later this week.
[MUSIC]