[MUSIC]
The final topic of this module on Eigenproblems, as well as the final topic
of this course as a whole, will focus on an algorithm called PageRank.
This algorithm was famously published by and
named after Google founder Larry Page and colleagues in 1998.
And was used by Google to help them decide which order to display their websites when
they returned from search.
The central assumption underpinning page rank
is that the importance of a website is related to its links to and
from other websites, and somehow Eigen theory comes up.
This bubble diagram represents a model mini Internet,
where each bubble is a webpage and each arrow from A, B, C,
and D represents a link on that webpage which takes you to one of the others.
We're trying to build an expression that tells us, based on this network structure,
which of these webpages is most relevant to the person who made the search.
As such, we're going to use the concept of Procrastinating Pat
who is an imaginary person who goes on the Internet and
just randomly click links to avoid doing their work.
By mapping all the possible links, we can build a model to estimate the amount of
time we would expect Pat to spend on each webpage.
We can describe the links present on page A as a vector, where each row is either
a one or a zero based on whether there is a link to the corresponding page.
And then normalise the vector by the total number of the links,
such that they can be used to describe a probability for that page.
For example, the vector of links from page A will be 0 1 1 1.
Because vector A has links to sites B, to C, and to D, but
it doesn't have a link to itself.
Also, because there are three links in this page in total,
we would normalize by a factor of a third.
So the total click probability sums to one.
So we can write, L_A =
(0, a third, a third, a third).
Following the same logic, the link vectors in the next two sites are shown here.
And finally, for page D, we can write L_D is going to equal,
so D is connected to B and C, but not A, two sides in total,
(0, a half, a half, 0).
We can now build our link matrix L by using each of our
linked vectors as a column, which you can see will form a square matrix.
What we're trying to represent here with our matrix L
is the probability of ending up on each of the pages.
For example, the only way to get to A is by being at B.
So you then need to know the probability of being at B,
which you could've got to from either A or D.
As you can see, this problem is self-referential,
as the ranks on all the pages depend on all the others.
Although we built our matrix from columns of outward links, we can see that
the rows describe inward links normalized with respect to their page of origin.