Hi, it's Matt again. And now, we're going to be talking a
little bit about the diameter of a network.
So again, we're still in our basic definitions, trying to understand how we
represent networks, things we can talk about in terms of their characteristics
and diameter is a very important one. Diameter and average path length refer to
how close nodes are to each other. So, how long will it take to get from one
node to another in some sense? How fast will information spread?
How, how quickly can we get from one part of the graph to another?
These are going to be questions that are going to be answered by things like
diameter and average path length. So, when we look at the diameter, the
definition here is the largest geodesic in the network.
So, the largest shortest path. so if, if we have a network which is not
connected, often when people talk about the diameter, they'll talk about the
diameter of the largest component. It get's a little iffy spent a lot of
time's, you're going to be stuck with a few isolated nodes in a network.
And so diameter often will then just to re, refer to the component largest
component. one thing about diameter is it can be
prone to outliers. It could be that there's, that happen to
be one pair of nodes which are very far from each other, but a lot of the other
ones are, are relatively well connected to each other.
So, average path length is going to be a different measure than the diameter
inside of the maximum geodesic. It's going to be the average geodesic in
the network. And if we look at different network
structures, diameter begins to tell us things about it.
So, for instance, if we look here at the network on the left.
We've got a ring or a circle of different nodes connected to each other.
And in that situation, when we look at these nodes, we end up seeing that they
are some of them can be quite far from each other.
And the diameter in this case is on the order of half the number of nodes.
So, either n over 2 or n minus 1 over 2, depending on whether we've got even or
odd number of nodes. Whereas, if we look at a tree, the
situation on the right here in this picture, what we have is a much more
efficient way of getting from one node to another.
And so, if it, if we think about the number of levels that this tree has.
So, in this case it has three different sets of, of links.
So, one, two, three depth. so in this case, we're going to end up
or, or we're, we're going to end up covering a number of nodes where n is
equal 2 to the K plus 1 minus 1. So, you can just go through and, and keep
track of that yourself. when you look at the relationship between
how many levels you have and how many nodes you have, you end up, in this case,
having K is equal to log base 2 of n plus 1 minus 1.
So, a very simple formula. And with that in, in this case the
diameter is 6. It's twice what this K was.
So you, getting from node 25 here to node 30.
it takes a path link of 6 to go up 3, and then down 3.
So, it's twice times K is going to be the, the longest shortest path in this
network. And what this tells us is that the order
of, of the diameter is going to be roughly 2 times this log of n plus 1.
So, that's giving us sort of how quickly we reach out.
And the idea here is a tree is a much more efficient way of connecting nodes in
term of short paths than we saw in terms of this circle.
Because now we, we end up connecting nodes to each other in a manner which is,
is quite efficient in terms of the branching process.
And so, even with a small number of, of links, we're able to more efficiently
connect things. And so, instead of scaling linearly with
the number of nodes, this is going to scale with the log of the number of
nodes, which is going to tend to be a much smaller number.
Now, in, in terms of small average path lengths and diameters, one thing that's
been found quite extensively in terms of looking at social networks in the real
world is that they tend to have short average path lengths and short diameters.
And what do we mean by short? It means intuitively that we connect a
large number of nodes with a, a fairly small number of connections.
The famous most famous example of this was Stanley Milgram's experiment where he
had people sending letters from one part of the US to another.
I believe it was part of the Midwest, say, Kansas and Nebraska, and you were
trying to get a letter to somebody on the East Coast, say, in Massachusetts.
he would give them say, a person's name and the rough location of where this
person was and maybe some basic background information about the person.
So, you know, I'm supposed to get something to John Smith who lives in, in
Massachusetts and who is a lawyer in Amherst.
And I live in Lincoln, Nebraska. And the, the idea was that I could,
should send the letter to somebody I know.
And then, the instructions in the letters tell them to send it to somebody they
know and so forth, with the intent of eventually getting to, the letter to the
person that was named. And the amazing thing that Milgram found
was that out of the letters that were originally mailed from the initial points
the median number of hops that it took to get to the end point was 5.
And in fact, 25% of the letters made it. So that, that's first of all, fairly high
in terms of response rate for this kind of participation in an experiment.
And secondly, the median of five seem to be quite small given that you're starting
with somebody in, in one part of the country who had no knowledge of the other
individual, in another part of the country and it took them only five steps
to get to that individual. this kind of experiment has been
replicated in, in more, more advanced settings, we're working with e-mail and
other kinds of things. one thing that's sort of interesting
about this experiment is that when, you know, if we think about the median of 5,
we have to be a little bit careful about how we interpret that result.
Why? Well out of the 75% that didn't make it,
the longer a path is, that, that would actually take to, to go from one person
to another, if we think that there's some probability that people won't send it on,
the longer the path, the more likely it, it becomes that it won't appear as having
made it in, in the experiment. So the, the letter will never reach the
final point where Stanley Milgram could then collect it and get all the names.
So, the ones that actually reach are going to tend to be shorter than the
others. So in, in experiments that have followed
up on this, you can begin to try and estimate, you know, how many didn't make
it. And, and those are going to be biased
upwards. But it's still sort of an important
number because people are getting it in only five steps and they didn't get to
see the network. It's not as if they saw the network of
all the connections in the US and we're able to figure out what the most
efficient path was. They were just trying to send it to
somebody they thought might know somebody.
You know, if I, If this person's a lawyer then, if I know a lawyer and I know a
lawyer in Massachusetts, maybe I send it to them or if I know a lawyer in Chicago
who might have a friend in Massachusetts. So, I'm, I'm trying to navigate the
network without actually knowing the explicit structure of the network.