Hello again everyone and welcome back.

In the past lesson,

I listed and described five important steps that are usually necessary

when performing risk stratification to solve health care problems.

In this lesson, I describe technical steps that statisticians and

data scientists usually have some extensive knowledge with and experience.

The general steps involve model creation and evaluation of the model,

the final step is often considered the most important,

the actual scoring of new data and deployment of the analytics to solve problems.

At the end of the lesson,

you will be able to list and articulate the meaning of

the last five steps when performing risk stratification,

create a model, evaluate the model,

score new data, rank and stratify,

and deploy the analytical output.

Let me show you what I mean.

In our prior lesson, we covered steps one through five to be: one,

decide on algorithms whether to buy or to build,

two was to select the target variables,

three was to consider groupers,

four was to specify a time periods,

and five was to select candidate predictors.

This lesson starts off where we left off with these additional five steps,

starting with number six,

which is model creation or to create the final model,

seven is evaluation, eight is scoring new data,

nine is ranking in stratifying,

and step 10 is deployment of the analytical output.

Let's go a bit deeper into each of these in turn.

Let's get started with step six,

creating the final model.

This step involves deciding which variables from step five,

selecting candidate predictors you want to include.

Recall what you have learned about model selection in your role as data scientist.

For example, A shows the principle of parsimony.

This assumes that simple models often fit the data

as well or better as compared to complex models.

Next B, collinear variables should be avoided.

In other words, variables that are strongly

correlated with one another are often problematic.

C, tests with holdout or validation datasets.

Analysts should pick the most important variables

or create an index that blends these into one variable.

After careful analysis, you'll arrive at D,

a final list of modeling variables that should be selected.

For example, you might have found that including

20 or 30 Hierarchical Condition Category groups for an outcome of interest,

creates a very high R-squared value.

However, your comparison of the training and

holdout data may reveal that you are at risk of overfitting the data.

Thus, you might smartly select a smaller subset of variables.

Finally, E, once you have a final model,

you should run the model to obtain the coefficients or the weights.

That gives you a picture of what takes place within step six or creating the final model.

Step seven involves evaluating the model

to make sure the global fit of the model is adequate.

A global measure for the overall fit of an ordinary least

squared regression model is the R-squared value.

For logistic regression, you can use

the receiver operating characteristic or the ROC curve.

Evaluation of the bottle is likely to be

an iterative process that involves the previous steps.

Our model evaluation steps bring up the question,

did we really choose the correct patients or members to target?

To answer this question,

let's briefly review some common parameters to evaluate predictive models.

The R-squared evaluation method is common

with regression models that have continuous outcomes.

The R-squared value shows the percentage of

the total variance of the observed data that is explained by the model.

Between 10 to 20 percent is typical for most models.

This is usually what we see with models that try to predict

costs using administrative healthcare data.

You might ask, is 10 to 20 percent really poor performance?

I would say no, because the measure describes how well the model predicts

each value rather than simply the rank of the observations are the values.

In other words, the metric looks at how well

the model predicts costs for each observation.

But for stratification, we mainly care about ranking.

Thus an R-squared value of about 15 percent might be quite effective.

Now, let's discuss some traditional parameters for a dichotomous models.

Logistic regression models are within this category and are commonly used in health care.

This table summarizes the common yet powerful accuracy measure

used with logistic regression.

Sensitivity is the true positive rate

or in the context of our confusion matrix shown here,

a divided by a plus c. Specificity is the true negative rate or d divided by b

plus d. Positive predictive value is

the percent of the predicted high cost that are indeed the true high cost.

The negative predictive value is not used very often for risk stratification.

This is because it is usually the positives or the high cost that we care the most about.

We're still in step seven.

Now, let's move on to evaluate various cut points that will

define the strata or risk groups as we evaluate the model.

Life is all about trade offs and we face

tough choices when assigning the cut-points for strata.

This is important because different cut-points that define

the strata will impact accuracy measures in different ways.

If you maximize the true positive rate or the sensitivity,

the true negative rate specificity will likely be reduced.

This figure illustrates how different cut points have

different numbers of observations flag for intervention.

As one moves up and down the ranking,

the accuracy metrics can help evaluate

what it really means to get the high-value targets.

If the users of risk stratification information

are not willing to miss any high-value targets,

then they risk adding in extra low-value targets.

Thus, the real question is;

what are the costs and benefits associated with mixing lower and higher value targets?

Next is we evaluate the model,

consider that cost concentration is an alternative accuracy measure offered.

It is nicely described in an article titled

Accuracy of Predictive Models for Disease Management,

the authors illustrate the drawbacks with traditional parameters just mentioned,

such as R-squared and receiver operating curves.

They illustrate how a measure called cost concentration

can help with risk stratification from the perspective of costs.

They defined cost concentration is percentage of true costs of the total population

that is concentrated among the sub-population that was predicted to be at high costs.

As an example, imagine that five percent of the population was predicted to have

a disproportionate amount of costs after

following the stratification steps described earlier.

If the model had no predictive value,

then five percent of the costs would be expected in this group.

If the model is more accurate,

this group would have a larger fraction of the cost.

For example, maybe 25 percent or

more of the cost would be associated with this small group.

This measure is like Pareto's rule that was described earlier.

A small fraction of people are expected to have

a disproportionate share of the illness or the costs.

Let's move on to step eight,

which is about scoring a new dataset.

This assumes that we follow the guidelines of building models on training data sets,

but then deploying them on new data sets.

Models have structures and parameters that can easily be applied to new data sets.

The parameters of the model are the coefficients

or the weights that are fit to the training data.