0:07

Hello. Today, we're going to talk about how to find important nodes in a network.

Recall these friendship network among 34 people in a karate club.

Based on the structure of this network,

which would you say are the five most important nodes in this network?

Of course, there are many ways of thinking about this question and we're going to start

thinking about different ways in which you could

imagine answering this particular question.

So, one way to answer the question would be to say,

well, nodes who have a very high degree,

nodes who have lots of friends are important nodes.

And if we use that definition then we'll find that

the five most important nodes are nodes 34,

1, 33, 3 and 2.

These nodes right here.

There are other ways in which you can imagine answering this question.

Another way would be to say that nodes who are important

are nodes who are very close to other nodes and network,

nodes who have high proximity to other nodes and network.

And if we use that definition,

then the five most important nodes in the network would be notes 1,

3, 34, 32 and 9.

So, instead of having node 33 we'll have node 9 and then instead of having

node 2 we'll have node 32 and all the other ones stay the same.

Yet, another way of thinking about importance would be to say that nodes who are

important are nodes who tend to connect other nodes into network.

And so, we could imagine measuring importance by

the fraction of shortest paths that pass through a particular node.

And if we do that,

if we define in that way,

we find that the five most important nodes in the network are nodes 1,34,

33, 3 and 32.

So, instead of having node number 9,

we'll have node number 33 in

the top five and every all the other nodes will stay the same.

So, at this point,

these three definitions of importance are very informal and the goal of

this video and the following videos is going to be to get

a more precise definitions of how to measure importance in a network.

This general topic is called Network Centrality and these

centrality measures are the ones that allows us to

find the most important nodes in a particular network.

And the cases in which we want to use these techniques is when for example,

sometimes we're interested in finding who the influential nodes in a social network are,

which are the nodes that are very good at disseminating information to many other nodes,

or on the other hand,

which are the nodes that are good at preventing

sort of bad behaviors or epidemics from spreading on a social network?

These can be used in other networks as well,

so we can use a centrality measures to find

hubs in a transportation network or important pages on

the web or more generally these measure

allows us to find nodes that prevent the network from breaking up.

Nodes that if we were to remove them,

they would cause the network from they would cause the network

to fracture and break up into different components.

Here's a list of centrality measures that are commonly used.

We're going to be talking about some of these in this course and in this video,

we're going to be focusing on degree centrality and closeness centrality.

So, degree centrality makes the assumption that important nodes have many connections.

This is the most basic way in which you could define importance or centrality.

Simply, that nodes who have lots of neighbors,

lots of friends are very important,

and it makes sense right?

Of course, depending on the type of network you have

you would have to use different types of degree.

So, if you have an undirected network you can simply use that degree,

but if you have a directed network then you have to choose between using in-degree,

or out-degree, or a combination of both.

So, let's get into exactly how we would define this.

So, we'll say that the degree centrality of a node V is going to be the ratio

between its degree and the number of nodes in the graph minus one.

So, in this case a node would have a centrality of one if it's connected to

every single node in the network and

a centrality of zero if it's connected to no node in the network.

And so, this measure goes between zero and one,

with one being the case where you're most connected.

Let's see how we can use network X to

find the degree centrality of nodes in this network.

So, first let me load up the karate club graph and then let me

convert the labels of the nodes so that they match what we see here in this figure.

And now, we can use the function degree centrality

to measure the centrality of all the nodes in the network.

And so here, these returns a dictionary

of centralities of every node and we can for example,

look at the degree centrality of node number 34 which is 0.515 and that's

because node 34 has 17 connections and there are 34 nodes in the network,

so that is 17 or 33.

The Degree Centrality of note 33 is 0.364 and that is 12 over 33.

Now, like I said in directed networks,

we have the choice of using the in-degree centrality or the out-degree centrality

of a node and everything else is defined in the same way.

So, the in-degree centrality of a node V is going to

be its in-degree divided by the number of nodes in the graph minus one.

And we can use the function in-degree centrality network X to find

the in-degree centrality of all the nodes in a directed network.

And so in this case, A will have in-degree of 0.143 which is two or 14.

There are 15 nodes in this network and L will have a in-degree centrality

of 0.214 which is three over 14 and that's because node L has an in-degree of three.

And the very same way we can define not using the out-degree instead of

the in-degree centrality using the function out-degree centrality

and the out-degree centrality of A is 0.214,

because it has an out-degree of three and the out-degree centrality of node L is 0.071,

because L has an out-degree of one.

Another way of measuring centrality is what we call closeness centrality.

And the assumption here is that nodes that are important are going to

be a short distance away from all other nodes in the network.

Recall that we measure distance between two nodes by looking at the shortest path,

the length of the shortest path between them.

And so, the way we're going to define the closeness centrality of

node V is going to be by taking the ratio of the number of

nodes in the network minus one divided

by the sum over all the other nodes in the network,

and the distance between node V and those nodes.

So, that's the sum and the denominator in the definition of centrality.

Now, so let's say that we want to use network X to find

the closeness centrality of this node 32.

We can use the function closeness centrality which returns the dictionary

of the centrality of the closeness centrality of all the nodes.

And here, we find node 32 has a closeness centrality of 0.541.

So, using the definition of closeness centrality let's see how this 0.541 comes about.

So, first of all let me look at the sum of the length of

the shortest path from node number 32 to all the other nodes in the network.

I'm using here the shortest path length function,

which you've seen before,

which gives you the length of all the shortest path from

the node number 32 to all the other nodes and that sum here is 61.

And so, then if we take the number of nodes in the graph minus

one divided by 61 that's how we get this 0.541.

Of course, we're making

the implicit assumption that all the nodes can actually reach all the other nodes,

but of course, this is not always the case.

In particular, when we have directed graphs,

this is often not the case and even for undirected graphs you can have

multiple connected components and you

can have it so that some nodes cannot reach other nodes.

And so, what happens, how do we measure closeness centrality

when a node cannot actually reach all the other nodes?

Let's take this example, right.

So, node L here can only reach node M. So,

how do we measure its closeness centrality?

We have a couple of options here.

And the first option we can simply only consider the nodes

that L can actually reach in order to measure its closeness centrality.

So, the way this would work is that we define this set RL to

be the set of nodes that L can actually reach and we define

the closeness centrality of L to be the ratio of the number of nodes that L can

reach divided by the sum of

the distances from L to all the nodes that L can actually reach.

And so, for node L here,

this would be simply one,

because L can only reach node M so RL here is just

the set M is just the node M and L can reach M in just one step.

So, the closeness centrality of L here,

would be one over one,

which is one and this is the highest possible centrality a node can have in a network,

which seems a little bit problematic because as node L can only reach

one node and we're saying that it has

the highest possible centrality than any node can have,

this seems an intuitive, right?

So, here's where option 2 comes in.

In option 2, we again only consider the nodes that L can reach,

but then, we normalize by the fraction of nodes that L can reach.

So, the way this looks here is that when we compute the closeness centrality of L,

we have the same ratio of RL over the sum.

But now, we're going to multiply that ratio,

the fraction of nodes that L can reach,

RL, divided by the number of nodes in the graph minus one.

So basically, we're normalizing by the fraction of nodes that L can reach.

And so, if L cannot reach many nodes we're going to be

multiplying these other fraction by a very small number.

And so, in this case if we do that we find that L has a closeness centrality of 0.

071 which is more reasonable than defined to be one.

One thing to note here is that in this new definition when we're normalizing,

we're not changing the definition of closeness centrality when the graph is connected,

where in every node can reach every other node.

That's because when that's the case RL for node L would equal M minus

one and this formula that you see

here would be the exact same formula that we had before.

So, we're not changing the definition.

This definition can still apply even if the graph is completely connected.

Okay. So, how can we use network X to find the closeness centrality?

Well, we can use the function closeness centrality.

And here, you get the option of normalizing or not normalizing.

And so for example, if we choose not to

normalize then the closeness centrality of node L would be one,

as we saw before,

and if we choose to normalize then it's closeness centrality would be 0.071.

So, in summary, today we talked about

centrality measures and how they aim to find the most important nodes in the network.

We said that there are many many different ways of defining centrality

and today we talked about two specific ones.

A very basic definition of degree centrality which makes

the assumption that important nodes are those who have many many connections,

and we use this formula to measure the centrality of a node and it's simply

the ratio between the degree of the node and the number of nodes in the graph minus one.

And depending on whether we have a direct or undirected graph,

we can use the degree of the node or the in-degree or the out-degree,

and these are the different functions that you can use in network X to apply them.

The second centrality measure that we looked at was closeness centrality

and the assumption that it makes is that important nodes are close to other nodes.

And here is the general formula that we use to compute it.

And here, we can choose to normalize or not normalize as we

discussed and the function that we use in

network X to compute it is the function closeness centrality.

This is it for this video and I hope to see you next time. Thanks.