0:02

Okay, so let's talk a little bit about the distributions of degrees in a

network, and why, why do we care about this?

Well obviously, when we are trying to characterize networks in terms of basic

properties that they have knowing whether the most nodes have very similar degrees

or very different degrees can give us a lot of information.

So for instance you know, whether everybody has one or two links or some of

the nodes have eight and others have one, that's going to give us a very different

kind of network. And that different kind of network's

going to have different properties in terms of its difusion properties, its

path links and other kinds of things. So, you know, what we looked at in terms

of the GMP graph had, most nodes had order of magnitude, similar degrees to

each other. let's look at this in, in a little bit

more detail. So when we look at say these Erdos-Renyi

GNP random graphs. well the, the chance that a given node

has exactly d links, is just follows what's known as a binomial distribution.

Right? So the chance that I have exactly d links

present and d the rest of them not present, well I have chance p of a given

link being present raised to the dth power, then 1 minus p raised to the n

minus 1 minus d power. And how many different ways could I have

actually picked the, the other d nodes that I'm connected to?

well there's n minus 1 factorial over d factorial times n minus 1 minus d

factorial. So basically this is what's known as n

minus 1 choose d. that's the combinatorics of how many

different ways I could have exactly d neighbors out of a population of n minus

1 possible other neighbors. Okay, so that, that's a binomial

distribution. And these networks are also known

sometimes as Bernoulli because you know, you're flipping coins, binomial or, or

Poisson random graphs. Why Poisson?

Well, if you want to approximate this if you want to approximate this formula for

large n and relatively small p. the, when you look at 1 minus p raised to

the n minus 1 minus d power, this looks roughly like e to the minus n minus 1, p.

And this combinatoric choose the expression looks roughly like n minus 1

to the d over d facotiral. So when you, when you bring these two

factors together it's roughly n minus 1 to the d that you're left with, and this

is well approximated by e to the minus, n minus 1 to the p.

So, you know, the binomial approximation here, by a Poisson distribution means

that we can represent approximately the probability that a given node has exactly

d neighbors by, this formula here which is known as a Poisson distribution.

Okay, so lets have a look at some of these random networks, lets, this is

actually a random network I drew on 50 nodes with a probability 0.02 of having

any given pair of nodes connected. And this is one realization that was

drawn in what's known as UCINET, you can find that on the web if you want a

network package. It's one of several that we'll talk about

on the course. so I drew this using UCINET.

random graph 50 nodes 0.02. So let's have a look at the actual degree

distribution, compared to a Poisson approximation.

So if we have a Poisson approximation, where you had an expected probability of

links of 0.02. Then you would get the square dots here,

the realized frequency is by the diamonds.

And actually, these two are very difficult to tell apart.

So the frequency distribution of how many nodes have degree 1, degree 2, degree 3

and so forth. There's a couple of to few nodes that,

that had degree two, but everywhere else we're pretty much on the graph.

you know, a couple things to note, out of our picture that we had, when we go back

here there's a bunch of isolated nodes. it looks pretty much, the rest looks

sort of tree like, there's no cycles in it.

so we see several components, no component has more then a small fraction,

we're just starting to see one large component emerge.

if we go and instead bump this probability up to 0.08 on 50 nodes, then

we get a much more connected graph. We still have one isolated node, and we

end up with y'know, more nodes having many more connections.

What's the actual distribution look like? Well, now when we look at the actual, the

Poisson approximation, we have a curve here.

And then when we look at the actual realized frequency, we end up with

something which is not exactly on target. the piece is a little larger, so the

approximation's not quite as great, but it's still fairly close.

And if you had more nodes and looked at a random graph on a larger set, then you

know, these things would tend to coincide even more.

But again, the poisson gives us a reasonable approximation of what what we

would have expected to be realized. so this is a poisson distribution.

What we'd see if we saw links formed indipendently at random.

6:26

what thing, what sort of interesting is that when you look at some networks they

tend to have what are known as Fat tails. And one of the early looks at this in the

network context was by Price in 1969. And he was looking at citation networks,

and so, look, looking at, at papers that cited other papers, and articles that

cited other scientific articles, and then looking at how many sites given articles

have, and then you, you know, you could just put down sites at random, or you see

what you, you actually saw in the data. And he saw what were known as Fat tails.

And [COUGH], what that means is, effectively you saw too many articles

that had lots of citations and too many that had very few citations, so if you

look at the extremes you saw those being over represented.

And then numbers that were close to the average, you saw fewer having sort of a

middle amount of of citations, you either got very highly cited or very low

citation. this is related to a lot of other things.

There, there's a lot of things that have Fat tails.

these distributions go back to Pareto in the 1890s so things like wealth

distributions distribution of cities, populations around the world, words

usage. There's a whole series of different

things that have these kinds of Fat tails.

And in particular there's a well know paper by Albert, Jeong and Barabasi in

the late 1990s where what they looked at was webpages and connections to webpages

in the Notre Dame part of the world wide web at that point in time.

And what is plotted here, is a comparison between the in, in the blue are the

actual data of the world wide web connectedness, and in the pink color here

is what we would see if, if the links had been formed uniformly at random.

And the idea of a of a Fat tails here are the idea that when we look at the actual

representation, we've got, so this is log of frequency, log of degree.

we have a higher frequency of really high degree nodes, and a higher frequency of

really low degree nodes, and then the middle degree nodes are under represented

relative to things. So we have this Fat tails, and in fact

the distribution in this log log plot is almost linear.

so, you know, a line is a reasonably good match for what's going on in these data.

we'll talk about that in more detail as the course pers goes along.

But the important thing here, is that there are the distribution is a little

bit different than what we'd have gotten if we had just thrown down the lengths

uniformly at random. Kay, so this is known, roughly is what's

known as a as scale free distribution. the probability that a, given link has

degree d, can be thought of as proportional to the degree raised to some

power. And, part of the reason that this is

known as scale free is if we, you know, double the degree, then we, we, we scale

up, the relative probabilities, scale up so that.

the we'll still get the likelihoods, will be proportional to each other as, as we

are increasing degrees. And in particular if you take log of each

side of this what do we end up with? We end up with the log of, of the

probability, the frequency of having a given degree is going to be proportional

to, some factor times the log of the degree.

and so it, it, if it was fitting this, this distribution exactly, we should have

a line. And indeed, when we look at what we saw

here and, and you know, we get a reasonable approximation by a line in

those data. Okay, let's have a look at a completely

different kind of network and see whether we also see Fat tails or not.

So this is, Bearman, Moody, and Stovel's data, the high school romance data, that

we talked about just, a few moments ago. and here, we're looking at one high

school, nodes are colored by, gender, male, female.

And a link between two nodes means that they had a romantic relationship during a

time period that was captured in, in the surveys in, in this high school.

And, in particular, you know, here we see a different kind of structure.

We've got a bunch of, of, you know, 63 different diads, so just two people

connected to each other. We see one fairly large component which

has a couple of cycles in it, but looks almost like a tree.

you see some other different shapes here, a number of, of different configurations.

So, a bunch of small components, one larger component.

And actually, there's some isolated nodes that are omitted from the picture.

let's look at the deg, degree distribution of this one, and compare it

to the uniformly at random. So, here again, log of degree, log of the

frequency. And in this case when we look at the

actual data compared to the uniform at random, the fit of, if we actually try

and match these up, the fit in terms of matching up the data is a very close

match from just having. The uniform at random compared to a power

distribution, doesn't fit this as well, so this doesn't seem to have the fat

tails that the other one did. What does that tell us?

It tells us that degree distribution's of different networks can have different

properties. And if we're looking at some, they might

have something that looks much more uniform at random, what this tells us

about romance in high schools. but it it's saying that this is looking

much more like a a purely at random graph whereas the other looked like it had

something else going on in terms of its degree distribution looking more like a

power distribution. So, that gives us a a brief look at

degree distributions. We're going to be visiting those a lot

more in the course, they're [UNKNOWN] a very important way of characterizing a

network. And understanding how dense the network

is and also what the sort of variation is in degrees across the population.

so let's have a look at some other definitions that are going to be

important in networks.