[MUSIC]
>> Hello and welcome back.
In this lecture, let's look at indexes, and what are indexes, and
how do they impact our queries?
So basically indexes are needed for efficient execution of queries in MongoDB.
So what happens without indexes is that MongoDB will
basically perform what we call a collection scan which is nothing but
scanning every single document in a collection.
So let's say you have 30,000 documents in a collection, and
when you look for data, a piece of data, say you're looking for
the second document, right, or the third document.
Whatever document you're looking for, MongoDB actually goes and
does a scan of every single document in the collection.
So that it goes through 30,000, it's fairly fast, but still,
it has to go through every single document, right?
So this is where indexes will come handy.
Indexes actually store a small portion of the collection's data set
in a easy to traverse form, right?
So it actually makes a smaller subset of a larger data set and we'll look
at an example and that should help you understand indexes much nicer and cleaner.
So when you actually create an index,
it all it does is it stores the values of a specific field.
So if you have ten fields in a document, and if you are trying to index, say
the third field, right, say a state in our example, or a city in our zips example.
Then it'll actually go and create a index and
stores only the value of a specific field.
And you can create a multi-field index as well which we'll talk about in
the upcoming lectures.
So it'll create based on the one or many fields that you pass, and
it also stores the data.
In an orderly fashion so ascending or descending order, right?
So let's see an example.
So here what we have is the list of scores.
This is an example taken directly from the Mongo site, right?
So we have say, a bunch of scores.
So here I think we have approximately nine documents in a collection.
But this could be thousands and thousands of scores of students in a University.
So there's a bunch of other information, name, first name, last name, and
all that, let's say.
But really all we care about is this one field called score,
which has a numeric value between min to max.
Min to max, let's say it's 100, so 0 to 100.
So this student has 25, this student has 56, 45, 75, sort of
random order in the way the documents came into the collection or however these
documents were added to the collection so we have a bunch of values here.
Now when you create an index so this is saying that go and
create an index on score.
Now the actual creation of an index single field, multi field.
You know, something we will do in a few minutes here.
But i'm just telling you that this intact for you to create an index, right.
And one is ascending, and minus one is descending.
Basically this is saying hey, go and create an index on this field, score.
And in that ascending order.
So what Mongo does is that it actually goes and creates and
index with the scores, just on the score, in an ascending fashion.
So you have the minimum here, you have the maximum here.
Notice how this pointer is actually is five,
right, that's the next minimum of all these scores here.
Five isn't the lowest, right?
Followed by 18, and then 30, and 45, and so on.
So it's actually building it in an ascending fashion.
Now, why is this important because the next time when somebody makes a query,
like less than 30, right?
When somebody makes a call, hey give me all the scores less than 30,
this by the way is a Mongo syntax.
Just pure Mongo syntax.
Of course you can do the same thing in Row B, which we will do later on.
So this, what this discusses, this actually goes and since we already have
the data in a nice ascending fashion, it goes and pulls the less than 30s.
In our case we will get five and 18 and whatever else we have.
Maybe just five and 18 and 25, of course, so we will return three documents.
So notice how quick and fast this is,
because you don't have to go through every single document in the collection.
So later on, in an example I'll show you, by using a command called explain.
You can actually see the difference between a command, a call that you make
before indexes and then a call that you will make same indexes.
So to apply the same concept to our zips example right?
So we have 30,000 documents in our collection, and so say,
I'm just going to take a sample of some data.
And we have four cities that belongs to one state.
MD, MD, MD and MD here and they you have a couple of other states in the mix.
I'm taking this on purpose just to show you that we have multiple or
duplicate entries, meaning of the same state.
So we know the problem, every time you make a call,
whether you look for Bethesda in Maryland or you look for For Reston, Virginia.
It has to go and look at all the 30,000 documents
in the collection to give you back the result you are looking for.
Data on the disk is stored in some random order, right?
All right, there is no guarantee that all the Maryland entries are in one
are altogether because it just depends on how the data was inserted.
And it's obviously inefficient because it has to process large volume of data.