0:09
Hi. This video is on causal graphs.
In this video, we begin to learn about what causal graphs actually are.
We'll become familiar with the terminology,
and we'll also begin to identify what we mean by paths between variables.
The motivation here is that causal graphs are useful for causal inference.
Causal graphs are also referred to as directed acyclic graphs,
at least in the causal inference literature just directed cyclic graphs,
are the ones that are most commonly used.
So you could think of that as really a special case of causal graphs in general.
These are going to be helpful for identifying which variables to control for.
So the causal graphs in general will help us
identify the variables that we need to control for.
Causal graphs will also make our assumptions explicit.
So what I mean by that is in
any particular study you'll have a treatment or exposure of interest,
you have an outcome of interest,
and you have a lot of other variables.
A causal graph will depict
whatever your assumptions that you're
making about the relationship between these variables.
So it will be very easy for people to see what you're
assuming and also to to critique your assumptions.
Here, we're just going to present some basic information of graphical models.
So we begin with some simple graphs to get the main ideas.
Here, we're showing two variables or nodes,
A and Y, and you see that there's an arrow between between them.
So this arrow was indicating a direction.
You could think of it as here as A affecting Y.
This is a directed graph because of the arrow.
There's also something known as undirected graphs.
So here, we have the same two variables or nodes,
A and Y, but between them is just a line.
So this is considered an undirected graph.
It's it basically implies that there's an association
between A and Y but a causal direction is unknown.
As an overview of graphical models,
one thing to note is that they all encode
assumptions about relationships among variables.
So the graphs will tell us about whether variables are independent from each other,
which variables there that they have a dependence on,
whether there's conditional independence and so on.
Importantly, they can also be used to derive non-parametric causal effect estimators.
And I'll talk a little more about what that means shortly.
Next, we'll begin to introduce some terminology.
So here's a directed graph where A's affecting Y,
and A and Y are known as nodes or vertices.
We can also think of them as variables,
but you should really,
you could really think of them as a collection of variables potentially.
So A is not necessarily a single variable, although it could be.
You could think of it as all of the variables that directly affect Y.
So you could call these,
you could call A and Y either nodes or
vertices or you could sort of loosely think of them as variables.
And you'll see that there's a link between them, in this case,
the arrow and that's also known as an edge.
So the link between A and Y in
this case is an arrow and that means that there's there's a direction,
and so therefore it's a directed path.
And, this is also
a directed graph because all of the links between variables are directed.
Here, there's only two nodes and one in one path,
but you can imagine that graphs can get more complicated and as long
as all of the edges are directed,
then it's a directed graph.
Also, variables connected by an edge are adjacent,
so here, A and Y are adjacent.
Here's a slightly more complicated graph,
where we have four nodes or vertices W,
Z, B and A,
and you'll notice that there's arrows between them.
So a path is a way to get from one vertex to another, traveling along edges.
Imagine that we want to go from W to B,
there are basically two ways you could do that.
So one is you could go from W to Z to B.
So you could go right along this path,
but you can also take a different route where you go from W to Z,
and then over to A and then back to B.
So they're basically here,
there's two ways to get from W to B,
but there's only one path from Z to W,
and we could write that as we have here with Z arrow W.
Now, what is a directed acyclic graph?
So previously, I was talking about causal graphs in general.
But what is a directed acyclic graph?
Well the name kind of gives it away.
So first it means that there's no undirected paths.
So you'll see the graph I have depicted here shows
an undirected path between A and Z, but that's not allowed.
If this is going to be a DAG or directed acyclic graph,
we can't have any undirected paths.
We also can't have any cycles,
so that's where the acyclic word comes from — no cycles.
Here, would be an example of a cycle where A affects B,
B affects Z, and Z affects A and it cycles around over and over,
so we can't have that.
And here, would be an example of a DAG.
You see that everything is directed, there's arrows everywhere.
There's no cycles.
So Z here affects A and B,
A affects B, but nothing sort of cycles back to Z.
So there's no cycles here,
so this is a proper DAG.
And I should note that I introduced DAGs because for
the rest of the course that is all we will consider when we talk about causal graphs.
So from from here on out,
we're going to strictly think about DAGs.
So next, I'll introduce some more terminology,
so that the terms that I want to introduce here are parents,
children, ancestors and descendants.
And these should be pretty self-explanatory.
So for example in this DAG,
A is Z's parent.
So you see that there's an arrow from A to Z.
You could think of A as causing Z or A is affecting Z.
So we'll think of A as happening first and it affects Z,
so we'll think of A as Z's parent.
Similarly, you could think of, in this case,
B as being a child of Z,
so Z is affecting B.
You can think of Z as kind of happening first and affecting B,
so we'll think of B as a child of Z.
You could also use ancestor and descendant terminology.
So in this example,
you'll see that D is a descendant of A,
so A was Z's parent,
and then Z was D's parent — therefore,
D is a descendant of A.
And similarly, we can flip things around and say
that Z is an ancestor of D. And returning back to the parents,
I just want to know they,
in DAGs there can be more than one parent.
So in this case, D has two parents, B and Z.
But you're not actually limited to two parents,
that any particular variable or vertex could have multiple parent,
more than two potentially.
So as was mentioned previously,
we're going to use DAGs to help us determine the set of
variables that we need to control for confounding,
which is also equivalent to saying that "it's
the variables that we would need to achieve ignorability."
However, before we get there we're going to have to think about
the relationships — the relationship between DAGs and probability distributions.