So, in that specific example, we looked at what's the probability that the random

variable was larger than six. But we might want to look what's the

probability the random variable's larger than seven, or smaller than six, or

smaller than five, or smaller than 4.3, or so on.

So if you take a random variable, you could construct a function that, when you

plug in a value. Returns the probability that the random

variable is less than that value. And you could construct a function that

when you plug in a value, returns the probability that the random variable is

larger than that value. And these things are so inherently useful

that we give them names. So the cumulative distribution function,

CDF, is simply a function that takes any specific value and returns the probability

that the random variable is. Less than that value.

And again, if it's continuous, it doesn't matter whether it's less than and equal to

or less than. But, the cumulative distribution function

is defined for both continuous and discrete random variables, so let's be

specific and say that it's less than or equal to.

The survival function is the opposite, namely, that it is exactly the probability

that a random variable is larger than any specific value.

So if you plug in x into the survival function, it returns the probability that

the random variable is larger than x. So if, on our previous slide, this figure,

imagine if on the horizontal axis, instead of looking at sect, I was looking for some

arbitrary value x, The gray area would be s of x and the white area to the left of x

and to the vertical axis would be f of x. Notice in this case that f is the

probability of being less than or equal to x, s is the probability of being strictly

greater than x, so that s of x and f of x have to add up to one.

Because their probabilities have complimentary events.

Probability X is less than or equal to X and probability X is strictly greater than

X. So if you've calculated the cumulative

distribution function you've also calculated the survival function because

all you have to do is one minus it, conversely if you've calculated the

survival function then you've calculated the cumulative distribution function.

Next we'll just go through our previous example and calculate exactly the survival

function and the CDF. Let's actually go through an example of

calculating the survival function and cumulative distribution function, just

from the exponential density that we considered before.

Let's calculate the survival function first.

So recall the survival function is the probability that the random variable's

strictly greater than the value lowercase x.

So further recall that to calculate probabilities, we need to calculate areas

under the probability density function. In this case we want the probability being

x or larger. So let's take the integral from x to

infinity of a probability density function.

Here I used the dummy variable of t for integration.

The integral, or the anti-derivative, of e to the -t/5/5 is just -e to the -t/5.

We want to evaluate that from x, which yields the value e to the -x/5, and then

subtract off the value as it limits to infinity, which is zero, so we just wind

up with e to the -x/5. Now we could also go through the example

of calculating the cumulative distribution function which would instead of

calculating the, integral from x to infinity, we would just be calculating the

integral from zero up to x, but because we have already calculated the survival

function, we know that its one minus the survival function, so its 1-e to -x over

five. So the cumulative distribution function is

the integral from minus infinity to x of the probability density function.

Again, here, we're just doing zero because the integral from minus infinity to zero

is zero. And, so.

We can just apply the fundamental theorem of calculus.

And, note that the derivative. Of the, CDF is exactly the density again.

So, if we take. Just, to go through our specific example.

One Minus E to the, negative X over five and take the derivative of that.

We get exactly E to the negative X over five divided by five.

So we get the, PDF back. So, derivatives of the cumulative

distribution function. Exactly yield, the.

Probability density function back. Quantiles are properties of distributions

or density functions equivalently. When I talk about the distribution or

density in general, just maybe say the word distribution, so if I want to talk

about the Bell curve or the associated distribution, I will just talk about

Gaussian distribution or the normal distribution and so on.

And then we are talking about the mathematics, I will be more specific.

The alpha quantile distribution. Is the point.

So that the probability of being less than that point is exactly alpha.

So we want. If Xed alpha, is the alphath quantile

redistribution, we want the probability of being less than or equal to Xed alpha to

be exactly alpha. So lets just take as a specific example,

if alpha was 0.25, X sub 0.25, is that point such that the probability of being

less than it is 25%. So for example, in our cancer survival

example, the 0.25th quantile of that distribution.

Is the time of survival so that 25 percent of the people survive less than that time.

The percentile is merely the quantile expressed as the percent, so the

twenty-fifth percentile is the point two fifth quantile.

And then the median, the population median, is exactly the fiftieth

percentile. Let's, just go through these concepts

again. With our, density that we've been looking

at. This exponential density.

Suppose, we wanted to find the twenty-fifth percentile of the exponential

survival distribution. What we want is to find the point X, on

the horizontal axis. So that the white area to the left of it,

is point two five. So let's actually go through this

calculation. In order to find the point so that the

area to the left of it is point two five, we just want to solve the equation point

two five equals f of x. We'll recall in this problem a couple of

slides ago we solved that f of x is one minus e to the negative x over five.

And, if you just simply solved that for x, we wind up with the solution x equals

minus log point seven five times five, which is about one point four, four.

How is that one point four, four interpreted?

About 25 percent of the subjects from this population live less than, 1.44 Years.

You can get quantiles directly from R by the Q function, Qx in this case, because

we are talking about the exponential PDF. So Qx gives you the quantiles from the

exponential PDF, Px gives you survival or CDF properties from the exponential, and

Dx gives you the density itself and that rule R follows for most of the common

distributions. The median, to remind you, is the point

fifth quantile, the fiftieth percentile. And quantiles that we just figured out the

point to fifth is generally called the lower quartile and you might have said, oh

I have heard of the median before. Maybe I've heard of what a lower quartile

is before and what those things are to me is that they are the middle of the data or

point in the data so that 50 percent of the observations are lower than it or the

lower quartile is that point in the data so that 25 percent of the observations are

below it. What in the world is Bryan talking about

at this point. So when we talk about the median, that we

are discussing in this lecture, we are talking about the population median.

And when you collect data and take a sample median, that's a estimate of

something, so we should talk about what that's an estimator of.

Right, it's an estimator and it has to have an estimand.

In the same way, if we take a sample mean of data, that's an estimator of something

and it has to have an estimand. And so what, what we're talking about in

this lecture is one way to construct estimands of these quantities.

In this case, the median, if you take the sample median, it is hopefully trying to

estimate the population median, that point in the population so that the probability

of being less than it is 50%. And you'll find in this class that there's

this simple rule. Sample things tend to estimate population

things. So sample medians estimate population

medians. Sample variances estimate population

variances. Sample means estimate population means,

and so on. And what we are going to see is this

probability modelling and the associated assumptions.

These are the things that connect our data to the population so that we can actually

have s demands. If we didn't go through this exercise, we

would be able to take a median, you know, that would just be an entity in a sample.

The whole point of probability in modelling.

Cuz then it connects your sample to this population so that your, now your, sample

median now has a population median that it's trying to estimate.

Now in, this is kind of a very difficult concept.

I think the sample median is a very easy concept.

It's saying, you know, you have a list of observations, take the middle one after

you order them. The population median is a much more

difficult concept. It's saying I have described a population

via this distribution. And this distribution has a point so that

50 percent of observations lie below it, and that's the population median.

And, I think it's a good idea. Whenever you're talking, in this class.

To put the word population or sample in front of it to remind yourself.

Now, people who work in statistics do this so much.

That they just kind of. Forget about these distinctions.

Even though they know them, they just forget about them because they become sort

of second nature. But when you're first learning this, it

seems quite odd. And I also wanna, mention.

You know the sample median is the well defined quantity that doesn't, require

tons of assumptions. It's the probability modelling that's the

delicate part. That requires assumption.

So if you are going to say that the sample median estimates the population median,

whether assumptions need to be taken for the account for that to be true.

And specially, when you want to do inference with your sample median or

evaluated uncertainty. And that's basically we are going to spend

nearly all of this class discussing is how we are going to connect these probability

and population concepts to sample data. Thanks recruits.

This was mathematical statistics boot camp lecture two.

In the next lecture, we're going to expand on probability modeling and defining

characteristics of probabilities, and I look forward to seeing you.