We will start with SIFT.
In SIFT, this stands for Scale Invariant Feature Transform.
This is one of the first feature detection schemes that had been proposed.
It uses image transformations in the feature detection matching process.
SIFT characteristics include that it's highly accurate, which is wonderful.
However, the large computation complexity limits the use in real-time applications,
and this is an issue.
Looking at the process that you see right here.
The table that you saw in the former lecture,
as well as the figures that I'm showing you right here are from
the PhD dissertation at Yonsei University of Dr. Young Son Pak,
who is my grad student.
We worked together in this research for
several years and I proudly I'm showing you some of
the results that were accomplished and are shown in his PhD dissertation.
I am very proud of my grad students that are doing research with me and had
doing research with me together in the past to accomplish new technologies.
So, I'm using this proudly and here we go.
In the overall SIFT feature,
you can see the DoG generation on the top.
DoG stands for this right here,
and we will talk about this in the following page.
It's based on Gaussian techniques and we are extracting the differences of this,
and after that process goes into key point detection mechanisms.
As you can see here,
the key points are based upon
these different levels of an image based on different scales that are
produced and finding maxima or minima points, the extrema points.
This follows by orientation assignment techniques,
which you see uses this which,
I will show you soon in further details.
Then there's descriptor generation techniques.
Then it goes down into downsample of the image,
and then looking at the octaves.
You will see about the details of this in the following page.
Difference of Gaussian generation, DoG generation.
Build a scale space using an approximation based on DoG techniques.
So, if you look at SIFT and the following SURF,
scale space is used as the core technology to start the augmented reality process.
Local extrema of the DoG images at varying scales are selected as interest points.
These local extrema, what is a extrema?
A local is within an image there are
local ranges that you can divide it into based on characteristics of the image.
Then, in each image there are extreme points,
which are the extremas,
as in terms of being either a maximum or a minimum point in that local range.
That's what we mean by local extrema.
These will be the key points that we are looking for
using the DoG images at various scales,
and these will become our interest points.
DoG images are produced by image convolving or blurring with
Gaussians at each octave of the scale space in the Gaussian pyramid.
So, we are going to use the scale space and Gaussian pyramid, in which,
in the scale space we have abstract scale space and coming down to the full scale space.
Stacking them up, it will look like a pyramid of images,
based upon Gaussian processing.
The processing is involving convolution or blurring techniques.
These Gaussian images based on the number of
octaves is down-sampled after each iteration.
Why use DoG?
What is this doing?
Well, a DoG is a LoG,
Laplacian of Gaussian approximation method.
So, what we're really saying is that we would want to use a Laplacian of Gaussian,
but instead of that we're using a simpler less computation burdening DoG technique.
A Laplacian of Gaussian.
Why do you want to use a Laplacian?
What is the benefit of it?
Hang in there, I'll explain it soon.
DoG has low competition complexity compared to LoG.
In addition, DoG does not need partial derivative competition like LoG does.
So, automatically from the statement you see that LoG is very computation
burdening and it is computation burdening because it uses partial derivatives.
DoG obtains local extrema of images with difference of Gaussian techniques.
Interest Point Detection classical methods commonly used LoG technology.
LoG is a very popular approach to use it because in Interest Point Detection,
LoG is scale invariant when it is applied to multiple image scales.
The characteristics of a Gaussian scale-space pyramid
and kernel techniques are frequently used.
This is just a conceptual example of a pyramid of images at different levels.
This is not exactly what you will see as a result.
However, it gives you an idea of what you mean
by a pyramid of images at different scales.
Approximation of Laplacian of Gaussian.
What is the Laplacian process doing?
It is a differential operator of a function on Euclidean space.
Once again, remember that SIFT and SURF are based
upon the euclidean space, the euclidean distance.
Where the other algorithms that were faster
and more appropriate for real-time were based on the hamming distance,
because they are based on binary operations.
Here, this scheme has a second order Gaussian scale space derivative based process.
This here is sensitive to noise and requires a lot of computation.
This is one of the reasons that we want to avoid the Laplacian process,
and that is why we want to have approximation methods.
That is why DoG technology is used instead of LoG technology in SIFT.