Welcome to Identifying Risk and Segmenting Populations, Predictive Analytics for Population Health. This is lecture b. In this lecture, we will get into more detail on some of the nuts and bolts of risk adjustment and predictive modeling in the population context. The objectives for this lecture are to discuss the inner workings of commonly used risk adjustment and predictive modeling tools. State some examples and use cases of how risk scores derived from this methodology are applied in different administrative or clinical contexts to help segment a population. Let's talk about the process of making risk adjustment and predictive modeling tools. How does one develop a model that can be used reliably and efficiently? In general, there are a couple of different approaches to model development, some coming more out of epidemiology and health economics and statistics. Others coming more out of computer science, data science and informatics. We must balance between using epidemiologic or economic analysis to uncover understood relationships between risk factors, such as disease and various outcomes of interest versus trying to explore the natural relationships based just on observation. The former approach is often termed the rule-set approach versus the data mining approach or machine learning approach, which is based on observations of association without regard to whether or not there is a cogent rational for structure of the observed relationship. All of these methodologies apply statistical forecasting techniques that try to develop increased explanatory power. Understanding relationships between risk variables and outcome variables. Again, as alluded, the rule sets are based on logic models that are grounded clinically or administratively. And pretty much, most of the commonly used models in the US today have a rule-set approach. However, increasingly, more data mining is being used to identify naturally occurring relationships that may be missed using other statistical approaches. And now, it's not uncommon to see a composite of the two development approaches. So when the term data mining or machine learning is used, one looks at the data, tries to mine the natural relationships and lets the machine, that is the computer learn about those relationships. This is sometimes called unsupervised learning where a clinician or healthcare expert doesn't really guided, it's just looking for relationships. The problem with that is you might have an elderly person who is hospitalized with the chronic disease and a young person with a series acute condition in an outpatient setting. All have similar empirical patterns of costs lets say. But, clinically or administratively they don't fit together so well in the same risk category. So again, the balance is how you get maximum cogency, logic for a clinician or health person. While at the same time, maximizing explanatory or statistical power. One also needs to avoid working with one very specific type of population, developing a model that works perfectly for only that group. Sometimes, the situation is called over-fitting. You quote, fit, unquote the model, so well that it works only for that population. And then when this tool is applied to a different population, it does not work well. So when developing a new model, one must try to balance the statistical with the common sense and logical. There are several dimensions need to identify a good model form a bad model. Obviously, one of these factors has to do with the statistical properties such as accuracy. That is if the model predicts that someone is going to be high cost or high risk is that in fact, what happens? The approach is used to assess this are those commonly used for medical screening test, such a sensitivity, specificity and predictive values. All of these are metrics comparing how many cases you get right versus wrong, but one can have a pretty accurate statistical model that doesn't make sense. If you can't understand it, if it isn't transparent, well then, perhaps that's not so useful. Speaking of usefulness, you need that ideally could be applied in many different applications. These applications or use cases can be financial, clinical or analytic. And each of these and more really, depends on the context, the timeframe, the target outcome of interest, the type of population. As you get experience in the area of risk modelling and segmentation, you will see there's really no one size fits all model nd that many of these widely used models really have multiple components that are each applied to different contexts. The methodology that we'll use as a case study here is one of the widely used methodologies. As mentioned, it's not the only tool, but it's one developed at Johns Hopkins where they are developing this educational model and one that is used across the US and the world, primarily in the population health domain. It is the so-called Johns Hopkins Adjusted Clinical Groups, ACG method. One can learn more about it at the web page that is offered on this slide. This and other tools that come out of a clinical framework are sometimes also called case-mix methodologies. That's because early on, when clinical classification tools such as the ACGs were developed, they were used to categorize patients or visits or admissions to allow for an assessment of the mix of patient quote, cases, unquote when aggregated across different providers. The first case-mix tool was the Diagnosis-Related Group, DRG. that is used to this day to categorize hospital admissions. ACGs were originally developed at about the same time and were used at first to categorize all other quote, ambulatory, unquote care in the community not grouped by DRGs. Today, diagnostic information derived from both ambulatory and inpatient care is input into ACGs and the system is use to predict all types of services including likelihood of hospitalization. However, unlike DRGs, the ACG system is not used to categorize the need for resources within each hospital admission. Many of these tools we've been discussing started out as ways to categorize the different diagnosis codes set up by the World Health Organization, WHO. The system is called the International Classification of Disease, ICD. We've recently moved to version ten, known as ICD-10. Before that, it was ICD-9. Other countries will soon be going to ICD-11. There are tens of thousands, and in some cases hundreds of thousands, of different numeric codes that can be used to describe most every malady, injury, and even natural state, for example aging or pregnancy, known to humans. A person can have any or all combinations of these ICD codes. For those of you familiar with statistics, you'll appreciate that one can't have a 100,000 factorial different combinations of risk, at least not easily. So figuring out how to design ways to efficiently input and categorize the diagnoses is a big part of how these tools were built. Increasingly, there are other types of different information brought into the mix. As mentioned, the ACGs are used widely in many different settings for tens of millions of individuals. But again, we offer it one well established model, certainly not the only tool that one can use. Your organization will likely already have a methodology that's integrated into its electronic medical record, or into its administrative system. Or, the consultant that offers it to you. Or, if you are a researcher, you can make use of a variety of public domain versions. Or you can access some of the academic versions of methodologies or purchase commercial versions. This slide presents a brief summary of the logic that goes into the Johns Hopkins ACG methodology. It takes all of the information available on a person, and then the categorization for each person in a cohort can be used to assess the risk of entire populations. The ACG software can take demographic information, the diagnoses, and pharmaceutical information. ACGs and other system can input prescribed medications as markers for patient risk. While there is some concern that such factors are sensitive to clinician patterns and patient access, they still offer important information. For example that a patient is on insulin or a drug for asthma. The common coding scheme used by these systems is the National Drug Code, NDC, from the US Food and Drug Administration, FDA. Also the WHO's Anatomic Therapeutic Chemical class, ATC, is a drug coding system used in many countries, and it also input by ACGs and some other methodologies. All of this input risk information can come from claims or other administrative data, or it can come from electronic health record data. ACGs and many of the tools have an array of categorization schemes embodied within the methodology. On the right you will see that ACGs have a range of different non-mutually exclusive diagnostic clusters, mutually exclusive categories, much like a DRG tree. Again, with thousands and thousands of diseases, there can be 50 or 100 different diagnoses just for diabetes, there is the need for a shorthand like this. In addition, there is an array of different non-medical markers such as if a person is frail, if care is not coordinated, if hospitalizations are likely to be how care is delivered. Also, ACGs and a few other tools can be calculated, either from diagnosis codes or pharmacy information. It doesn't matter if a doctor provides a patient with a written prescription on a pad of prescription paper, or if the doctor electronically sends a prescription directly to the pharmacy. It's common that a patient may be on a certain medication that's pretty clear cut, let's say insulin or hypertension medication, but there's no analogous diagnosis in the claims data. So pharmacy data, even with the concern voiced previously, are a useful additional source of risk information. So based on all of these inputs, the software will assign a risk level. That's usually done by looking at how each factor is associated with cost or other outcomes within a very large benchmark population. In other words, these different tools, ACGs for example, are developed and being updated using a multi-million person database to allow for the calculation of weights or scores that represent the importance of each risk factor in estimating a person's risk. More on this in a moment. To give you a feel of how this various system work, and suggest to you that transparency is important, this table opens the black box a little bit. To show how one of the markers of the ACG system reflects both epidemiologic and disease patterns. The original morbidity clusters were called Aggregated Diagnosis Groups, ADGs. The left side of the slide reflects some of these ADG categories, and on the right side are some common conditions that go into each one. In other words, if a person has one or more ICD codes that fall into these, or many other epidemiologically similar categories, they would get categorized as having this risk factor. For example, do they have a chronic condition that is stable, do they have a chronic condition that is unstable? Do they have a serious psychiatric psychosocial condition? Do they have a condition that is likely to become progressively worse? This chart shows that when a person has a high profile specific condition, in this case diabetes, understanding the other morbidities really is what is key to understanding variation in healthcare costs. Here the healthcare costs over a period of time are broken down by medications and outpatient and inpatient services. You can see that even when one knows that you're dealing with a specific disease cohort, say in a special program for persons with diabetes, risk segmentation, in this case using ACG risk segmentation, within that cohort is very important. Once the various methodologies develop categorizations because of the relationship between the different conditions, they then develop predictive models by showing the relationships between these variables and some outcome of interest. They usually develop a risk score, and that score represents, for those of you who might have statistical backgrounds, coefficients that show the strength of the associate of that factor with some end point or outcome of interest. To develop a score to help forecast future use, the ACG methodology includes a variety of risk information derived from within the method's various disease and pharmacy categorizations. These are arrayed on this slide. We're not going to go into great detail here, but you can learn more if you wish. For ACGs and other models, one may choose to use prior use measures as a risk factor. For example, was the person hospitalized in some past period? There is one caveat worth mentioning related to including prior use markers in a risk model. In cases where you may want to try to predict future hospitalization, understanding past use makes sense. However, if one is determining a risk score for say, capitation or payment adjustment, you would not necessarily want to reward an organization for admitting a patient, or penalizing them for treating them. So for example, if one organization is very efficient and tries to keep people out of hospitals. They would be penalized because their people would have a lower risk score. And they would receive a lower capitation fee than an inefficient organization that has a lot of patients hospitalized. So again, the choice of the model is very important, depending on its application. As noted, the same goes for taking pharmaceutical information. Yes, it's true that a patient, for example, with diabetes on insulin, is in general at a more advanced stage than someone not on insulin, and that could be useful in understanding risk of the population. However, if you're going to pay an organization more or consider them to be more efficient if they had all of their patients on insulin without regard to whether or not the patient needed insulin, that could be problematic. So again, understanding the measurement tool and it's applications and its context is very important. This is an example of how an individual can receive a predictive risk score. In this case, ACG's but most of the systems are similar. We calculate the score by assigning a weight to each of the risk factors based on past model development and then adding them up. So, we have, in this case, a female who's 65 years old. Compared to other persons in the benchmark population, holding everything else constant, that's a weight of 0.74. In terms of past utilization, which as noted maybe optional, are relatively high use last year, add some added risk to her predictive score. She also had a condition often treated in the hospital. But to avoid offering incentive, the ACG scoring here does not give extra points for actually being hospitalized. Only that she had a condition that usually needs hospitalization. She had diabetes and we know about it based on both the diagnosis code and the type of medication that she had. She also had a peptic ulcer disease with some medication. A chronic liver disease, a psychosocial condition and some other conditions, each of this add an extra score. You'll see by the way that other conditions not specified as specific diseases are also added. This extra weight of 2.71 is significant. This model suggests that this individual will have a predictive risk score of 7.99. But what is this compared to? This score is based on some outcome, in this case, next year's costs that has been calibrated in a benchmark population, either embedded in the ACG software or by the user for some past period of time. Once a score is assigned to everyone in a cohort of interest, they can be arrayed from low to high, similar to the past population pyramids depicted. Once the persons in the population have been ordered from low to high, one can identify the subgroup who are, say, in the top 5% or 1% of scores. These high risk persons can be targeted accordingly using population health interventions. The next few slide show how a risk adjustment measurement tool can be used and applied for a few different scenarios. Let's first look at how predictive modeling can be used to identify future hospitalization. A hospital readmission and initial readmission, for example, using a version of the ACG methodology. The reference of the population is in the footnote. A population is arrayed and it identifies the likelihood that a person will be hospitalized. In this case, the population with the highest percentage central risk means that their characteristics, disease and previous experience tended to be associated with hospitalization in the future at least in the benchmark population. The pie chart shows the distribution of scores, that is the percentile score that they had from the lowest to highest. In the graph on the right, one can see the distribution of the percentile from those that are below 0.1 likelihood to those that are 0.9 likelihood. You can see that, although not a perfect relationship, it's a pretty close relationship. One can calibrate the risk measurement tool on a variety of other endpoints. It doesn't have to be hospitalization. It could also be use of general services or use of pharmaceutical services for example. This graphic represents a payment application in Medicaid here in Maryland, the first state to adopt risk adjustment for capitation payment of managed care plans, using the ACG methodology. As part of the ACA, the US Center for Medicare and Medicaid services, CMS, uses versions of the HCC methodology for its Medicare advantage HMOs as well as its qualified health plans. Here, we have real data aggregated across various managed care organizations, MCOs, which are types of accountable care organizations in the Medicaid program. This graph shows the relative risk, that is higher or lower than average, which is 1.0 for two distinct populations in the Medicaid cohort. Families and children and those who are disabled across five MCOs. These shows that if you were paying health plans or providers a flat budget or capitation fee that you want to ensure that those health plans that have sicker populations are paid more fairly, and those that have healthier populations get less resources. Sometimes we say quote, let's slice up the pie fairly, unquote. The pie may be only a certain size but you need to quote, slice, unquote the pie fairly. For example, plan C is attracting more sicker than average patients than the other plans. It's ACG rated risk based on it's scores relative to the entire program are greater than average and it's share of the pie should be larger. Let's say that the capitation rate might be $500 a month. This organization would get 1.1 times that, a larger percentage, but the pie has to equal 1. So an organization let's say look at plan A or plan B, would get a smaller capitation rate per month because its cohort is less sick than average. Again, this is how risk adjusted payment works, something unique in the field of population health. Moreover, if one were developing a pay for performance reimbursement system, as is done in most ACOs, that still uses fee for service payments to determine whether or not there is a bonus or a penalty. One develops an expected cost target, sometimes called a virtual budget, using similar approaches as described here, based on the patient cohort's HCC or ACG scores relative to the entire population. This next slide shows the importance of risk adjustment for performance assessment. In this case, again using ACGs from the Canadian province of British Columbia. Here, primary care doctors, the general practitioners are monitored for how their billing practices are under a fee for service system, much like our US Medicare program. This graphic array is the distribution of the observed to the expected average fees over a year for each doctor's patient panel using two adjustment methods to calculate the expected, that is predicted cost. The lower blue line reflects the expected adjustment being done using the age and gender of the doctors patient cohorts. While the top orange line calculates the expected rates using the average expectation for persons with the same ACG diagnostic mix. Historically, many organizations have used age and gender for risk adjustment like this. Yes, older patients are more likely to use care than younger patients. And in certain age groups, men may have higher use than women, or women higher than men. But young sick people tend to use more services than older, healthy people. When reviewing the difference between the two distributions of Os to Es, you can see that the ACG adjusted top curve is more tightly focused around 100 than is the demographically adjusted curve. This means the practice is using the exact level of resources expected. The net result is that by considering the risk scores of the doctor's patients rather than just their age and gender, a much smaller number of doctors are considered outliers, that is well below or above average. The doctors in the province and the program administrators were pleased by this, as fewer doctors were flagged and visited as being potentially inappropriate low or high resource use outliers. This concludes lecture b, Identifying Risk and Segmenting Populations, Predictive Analytics for Population Health. We learned that predictive modelling tools are developed using large benchmark populations, using both analytic approaches and clinical logic. One of the widely used tools, the Johns Hopkins ACG System, was described in some detail to offer further insight into how the risk score mechanisms are constructed. We also examined several real world examples of how risk segmentation of populations is important for case finding, provider payment, and provider performance profile assessment.