0:02

Okay, so let's take a look at marrying

the strategic formation models we have been looking

at with some of the earlier types of

models that we had for estimating networks, random networks.

And in particular we'll look at sub graph generation model SUGM, and try and figure

out how we mgiht fit in some of

the utility based calculations we've been looking at.

Okay so we've got utility from forming subgraphs,

links, triangles, etcetera, but what we're going to do is, is

noise that up by putting in some randomness in the utility.

So let's have a look on how we might do that,

and let's do this in a context of a specific example.

So what we are going to do is try and

ascertain whether or not when we look at caste relationships,

is there some sort of social pressure that's operating on

these and not surprisingly we might find that there is.

And in particular are,

when we, when we look at cross caste relationships.

When we look at, at caste relationships that go

across boundaries, are they more likely to occur in private?

When people have no friends in common, or do they occur with the same frequency

when people have a friend in common as when they don't have a friend in common.

Okay, so that's one of the questions we might ask.

So let's look back at some of the data we had from this work

with, Abhijit Banerjee, Chandrasekhar and Estu du Flur.

So this is

1:24

village 26 again, kerosene-rice sharing, and what we're

looking at here, then, is again we've, we've colored the nodes by a,

just a dichotomous cast look, so schedule caste and schedule tribes are blue.

General and otherwise backward castes are red.

So what we've got is we see that,

you know, there's fewer relationships going across the

boundaries of, of this designation than within.

So we saw the, the probability of going

across was 0.006, the probability within was 0.009.

But we could you know, we can look at different, here's Village48.

A different ne, sub-network.

Who visits which household? Which other household socially?

It's a denser network but we see similar patterns in terms of the segregation.

And so what we want to ask here is, let's look

at, at say somebody from the red, some from the blue categorization.

Do we see this kind of relationship, where they have a friend

in common, less, relatively less frequently

than we see this kind of relationship?

So, do people have, prefer to form these things in private

rather than in situations where there's going to be some sort of witness?

To the interaction.

Okay, so one difficulty in beginning to estimate this kind of thing is

the fact that triads are going to take, triangles are going to take

three people to agree to form, whereas links are going to take only

two And so naturally it's going to be more difficult to get triangles to form.

And so we're going to get a bias that, that makes these

things relatively less likely and then if we make any particular

link less likely across, then these things might have a

lower likelihood just because we're working with threes, rather than twos.

And so what we want to do is account for preferences explicitly.

Otherwise we're naturally going to find that the

less diesired triads compared to more desired

triads is going to look less than less

desired dyads compared to more desired dyads.

So there's going to be a bias there unless we, ac, ac, account for this carefully.

So how are we going to do this?

So let's build

preferences into this.

And then look at a sub-graph generation model, and

then try and figure out what the, how the

probability of link forming depends on the likelihood that

the pair meets and, and both wishing to form it.

Okay?

So generally we can think of this as saying there's

characteristics that i has, say in this case their caste designation.

And there's going to be utility that they get from forming a link.

Based on their characteristics and

the other person's characteristics and then

there is some, something either unobserved

or some personality or something else which then also affects that utility.

So we'll put it in error terms.

We subtracted off something.

Which could be negative, it could be positive, so maybe

it's a boost but there is some random element here.

4:16

And i benefits from the link.

Yeah, if and only if, this error is less than this utility.

Okay, so if the error's

less in magnitude than the utility, then this term's going to be positive.

And you're going to want to form that link, and otherwise you're not going to.

Okay, so we have a very simple preference based model.

Now we're going to try and fit that in to a sub graph generation model.

So how we going to do that?

Well, under pairwise stability the links are going to form if and only if both of

these two prefer it assuming that the,

the chance they're getting exactly a zero utility

is, is zero.

So now we've got that links form if and only if i

prefers to form a link and j prefers to form a link.

So the error that j gets from forming a link with i is less than

the utility that j gets from forming a link with i, and, and so forth, okay?

So links are going to form on both of this things are true, and so if we have

some distribution of what the error terms look

like, then the probability that a given links forms

is going to be proportional to the probability

that their error is less than i's

utility and the probability, times the probability

that, that error is less than j's utility.

So has to be that both of them prefer it.

So when we take this product that will give us the

product that, that chance that both of these people prefer it.

What's the chance that both prefer it is the product of two, okay.

5:49

a, the noise in the chance that j likes i is

isn't dependent with the noise that i gets from the same relationship.

Okay.

Now you can do the same thing with triangles.

What's going to happen is now we going to have triangles depending on

the three people's characteristics and then we'll have multiply it three times.

One for i, j and k, okay?

So it's exactly the same kind of ideas and principles so we could generate any

kind of sub graph. By doing the same technique.

Right?

Putting in utilities for different sub-graph forms

depending on the characteristics of the individuals involved,

and then probabilities that people are actually going

to have errors that are less than that.

And, and that gives us some distribution here.

Okay.

So now let's go ahead and, and try and look at

how we would use this kind of model to estimate something.

So what's

the null hypothesis?

So if we think that there's no social

pressure, then we think that a given person's

preference for having an across, to being involved

in across caste triad, compared to within caste triad.

6:54

Is the same as whether they prefer across

caste link compared to a within caste link, okay?

So what we're allowing them is to care about caste but what we're saying is

they don't care the probability that they prefer something across caste in terms

of a triad is the same as their relative preference for that within a link.

And instead of a triangle.

Okay, so that's the null hypothesis that we have.

So now we can just go in and say, okay well what's our,

our model said that the frequency of cross caste triads compared to within caste

triads is going to look like this ratio.

Of utilities if we just assume now that

everybody has a similar utility function that either varies,

am I going across or within and then we

just get you syncratic noise on the particular relationships.

So then we have got a cube of the cross caste triads

compared to within and square on the cross links compared to within.

So now

we are correcting from the fact that triads are harder to form.

8:09

So what's the probability that I prefer this?

Well, this is going to be the cubic.

Right, we'll just take a cube root of the of the relative frequencies.

Right, so we can then just correct the

probability that preferred to form across is just going to

be the frequency to the 1 3rd, crossed for links is going to be to the 1 half.

Okay.

So now, if under the null hypothesis, these

two things should be the same, that tells

us that these frequencies in the data should

be the same, if the null hypothesis is correct.

Okay?

So what we can do is plot what's the frequency of cross caste

triads to compared to within to the one third power, look

at that compared to the links to the one half power.

And these things should be the same

under a hypothesis that social pressure doesn't matter.

And if they're different, then we can figure out which one,

you know, is, does social pressure encourage it or discourage it?

So if this number is, if the top

number is less, then we're seeing discouragement based

on the social pressure.

And if it's more, then we would see encouragement.

Okay? So let's plot these out.

Here's links down here. This ratio.

9:27

And this should be on the 45 degree line under

that null hypothesis, so this is the ratio of triangles.

This is the link ratio raised to the

three halves to correct for the three versus two.

And now when we look at these, they should all line up on the 45

degree line or half above and half below

and these are for the 75 different villages.

And indeed, we see that there are more winding up below.

And if you do a statistic test of just looking.

So one conservative test in this world.

Is that if the null hypothesis were true, then you ought to have a coin flip

as to whether a village ends up on one side or the other side of these line.

In fact, when you do that the preponderance of villages end up below the

line, and this is going to be statistical significant up to 99.99% level or more.

10:15

One interesting thing you can do here is

you can actually also then sub divide these villages by how integrated

they are in terms of, or how balanced they are in terms of the caste designations.

So, some of these villages would be 50% red, 50% blue, in terms of

those different measures we had of the

scheduled caste, scheduled tribe, versus general and otherwise

backward castes.

So some of them split halfway down the middle,

so you have two di-, you know, people evenly matched.

Others are say 90% to 10% or 95% to 5%.

So there'll be a big majority of, of one

caste group and a small minority of another caste group.

And so what we can do is look at how balanced the groups are.

So let's look at the relative size

of the, of how big the minority is compared to the majority.

And if the minority is above median size, then that gets a light blue.

So these ones down here.

We can see that most of them end up pretty far below.

There's only a couple of them that end up anywhere above the 45 degree line.

Most of them are ending up below.

Whereas the reds

are the ones where there's a little more imbalance,

so that the smaller caste are, are more minority.

And this actually, now you find that those, that reds

actually are, are a bit closer to the 45 degree line.

So the more skewed the village is,

actually you find less of the, the pressure.

11:53

According to the statistics, whereas here if you've

got a very well-balanced village then the castes

seem to separate more, in particular under the

triangles you see even more pressure to separate.

So this is actually something that you see in

different data sets is that the more balance things are.

The more tension there can be in forming cross group ties, and

in particular here we see that the relative ratio of triangles to in,

in cross caste compared to links you see that the, it's more, more

often that you get links compared to triangles in this kind of setting.

So this is just you know, one illustration of

how we might begin to marry these kinds of models.

But what it does to show us we can

use preferences together with other kinds of, of statistical

models to begin to estimates some of this models

and see what's going on in some of the data.

Get a little bit of a lens,

its hard to do, interpret this closely.

But at least we can figure out whether there are certain patterns in

a data, and here there are patterns among the triangles and the links.

So we reject the null hypothesis based on the model people

show us significantly stronger preference in terms of what we estimate.

Now whether or not they truly have those preference depends on whether the model is

correct, the model is a little bit simple

here more for the, pur, purposes of illustration.

But we can begin to build richer

models that take more into account and see whether this finding holds up to those.