Welcome to Population Health. Population Health IT and Data Systems. This is Lecture a. This component, Population Health, discusses the application of informatics and informatics methods in Population Health Management. This unit, Population Health IT and Data Systems, explains the challenges and opportunities of using different data types, data sources, and data systems for Population Health IT. The objective for this lecture of population Health IT and Data Systems is to Identify various data types and data sources used for population health management, including both traditional and nontraditional data sources. This lecture discusses the common data types used for population health. And focuses on demographics as the first data type commonly used in population health analytics. Population health data types can be categorized into common data types and emerging data types. Common data types are conventional data types for population health analytics and management that are often extracted from traditional data sources. These data sources include medical insurance claims, which include a variety of data types, such as demographics, diagnoses, procedures and cost. Medication claims, which include data types such as medications and their associated cost. And surveys, such as Health Risk Management, HRA, Active Daily Living, ADL, and other surveys that collect various data types for population health research and analytics. On the other hand, the emerging data types are often new types of data that are acquired from nontraditional sources of population health. These data types are gradually becoming more available across large segments of the population, thus making them an important source of population health data in the near future. Some of these data sources and their attributed data types include Electronic Health Record systems, EHRs, which contain multiple data types. EHRs include both common and emerging types of population health data. Some of the uncommon data types that EHRs offer for population health analytics include history of diagnoses, lab values, and vital signs. Another nontraditional data source for population health is the Health Information Exchange, or HIE. HIEs offer new types of information such as admission, discharge, and transmission notifications that could be utilized in real-time population health analytics. Throughout this lecture, we will review a list of data specifications for each potential population health data type. These data type specifications are as follows. Variables will provide a sample list of common variables that represent that specific data type. Background offers a discussion on why that data type is important for population health analytics. The background discussion often includes an introduction on the trend of that specific data type in the general population and its relationship to healthcare utilization and cost. Analytic will identify if the data type should be treated as a dependent variable, independent variable or both in population health analytics. Derived variables will list the different methods that could be used to extract additional variables, or so-called derived variables, from the data type. For example, medication adherence rates can be derived from medication data and insurance claims. These derived variables are often important in population health analytics. Coding standards will review a list of coding and terminology standards that are commonly used to encode the data type in discussion. For example, while discussing medications as a potential data type for population health analytic, the following list of coding standards are included. RxNorm, which is the normalized list of medications developed by the National Library of Medicine. The National Drug Code, or NDC. The Anatomical Therapeutic Chemical Classification System, or ATC. And the Systematized Nomenclature of Medicine, or SNOMED. Knowing which data standard is applicable to which data type will help users to recognize and help prevent potential interoperability issues that may arise while developing population health data systems. The list of data specifications for each potential population health data type also includes the following items. Data sources will provide the list of common data sources that typically contained that data type. For example, for the demographics data type, claims, EHRs, and HIEs are listed as data sources that usually capture demographics in their underlying datasets. Data quality will discuss common and potential data quality issues with the data type that may affect population health analytics adversely. Data interoperability explains specific interoperability issues that may affect the collection, integration and sharing of this particular data type. And how that will limit the use of this data type in the development of population health data systems. And finally, legal considerations will discuss issues with privacy and security that may impact accessing or utilizing this data type for population health data systems. For example, the Health Insurance Portability and Accountability Act, HIPAA, limits the release of certain identifiable information for medical records that may hinder the development of population health data systems. Before starting to go through the common data types used in population health analytics, it should be noted that all data types fall somewhere within the stack of health determinants. As depicted in this diagram, various Health Determinants or health factors map with specific data types of population health data systems. For example, the healthcare stack of health determinants matches with most of the clinical data used for population health, such as Claims, EHRs and HIEs. This mapping of health determinants with population health data types continues with the matching of the individual behavior determinant with surveys, such as HRA and ADL. Mapping of the social environment factors with social data types such as socio-economic status, SES. And finally, the attribution of physical environment with geographically bound information such as census data. Genetic information is usually not used for population health analytics and thus is not covered in this unit. Note that the health determinants have a direct effect on outcomes, such as mortality and morbidity, which are also some of the main outcomes in population health research. Common Data Types used for population health data systems and analytics include Demographics information, such as age, sex, and gender. Diagnostic information, such as the actual diagnosis and severity of a diagnosis. Medication information, such as prescriptions, dispenses, and filled medications. Procedures, such as medical evaluations and aesthetic procedures, surgeries, medical imaging, and other procedures occurred in inpatient and outpatient settings. Data collected by patient surveys, such as HRA and Patient Health Questionnaire version 9, PHQ-9, which is designed to measure the severity of depression. Utilization information, such as cost, hospitalization, admission to the emergency room, readmission, and so on. And finally, a set of derived variables that categorize other variables into meaningful groups. These grouper variables are often generated by a variety of commercial and noncommercial applications. There is a long list of grouper data types, including advanced clinical groups, ACGs, diagnosis-related groups, DRGs, and others. In this lecture we will focus on the demographic data types. Demographic data types are commonly used in population health analytics. The common demographic variables include age and sex. Both age and sex have have a direct effect on healthcare utilization. Indeed, the increasing life expectancy in the US is leading to an aging population, and, hence, an increasing healthcare cost. Thus, age is commonly used to predict the population health outcomes, such as future cost. This means that age is an important independent variable in population health predictive models. Derived variables from demographics may include age ranges or rule sets to deduct eligibility for programs such as Medicare. There are a number of existing coding standards for age and sex, but they are not commonly adhered to. However, most age and sex datasets are coded in a recognizable way. For example, most of the age data fields are encoded by the date of birth or age in years. And sex is often encoded with numbers or characters representing sex as female, male, unknown or none. Most patient level or patient center data sources include demographic variables such as age and sex. For example, insurance claims, EHRs and HIEs, all capture demographic data. The quality of demographic data is often acceptable due to various mandates to collect them accurately. However, data quality is always affected by various factors, such as measurement, user mistakes, data conversion issues, and so on. There are no major interoperability issues with demographic data. There are legal limitations on sharing age if it contains the date of birth or includes ages above a certain limit. Which increases the probability of reverse engineering and finding an individual in the larger population. Sex is not protected under HIPAA laws by default. Note that demographics data are not only used for population health predicted modelling, but also are often used to match patient records across different data sources. Thus, legal limitations to share demographic data may hinder the development of multi-source population health data warehouses. This image shows a sample age and sex data table, which is generated based on synthetic data. The collective result shows the sex, date of birth, and age for a given population. Each row represents one patient with his or her collected sex, date of birth, age in years, rounded years, lower or upper rounding, and age band. As depicted, date of birth, DOB, is of HIPAA concern. Also, cases within certain age ranges are subject to HIPAA rules, such as the case with an age of 91. An important data quality issue with age and most other numeric variables is the data conversion through rounding up or down which may affect a special case to be categorized in two different age bends For example, the 64.49 year old patient can be considered in the 35-65 age band if she is rounded down, but will be grouped in the 65+ age group if rounded up. This diagram shows the growth in the aging population of both males, on the left, and females on the right. The solid colored histogram shows the age of the population in 2010, while the hollowed histogram shows the distribution of the age back in 2000, while the hollowed histogram shows the distribution of age back in 2000. As shown the solid histogram has moved upwards compared to the hollowed histogram, indicating an aging population across almost all ages. Arrow one shows the trend among males and arrow two shows the trend among females. Note that the increase of age in the elderly has a higher impact on healthcare utilization than other age ranges. This diagram shows the progression of age bands from 1960 to 2010. For example, the percentage of population over 65 years old was only 9% in the 1960s. While this has increased to 9.8% in the 1970s, 11.3% in the 1980s, 12.6% in the 1990s, and finally 13% in 2010. In contrast, the less than 18 years old age band has consistently decreased from 1960 to 2010. These age changes, depicted by the increasing median age on the right side of the diagram and arrows one, two and three, will directly affect healthcare costs in the future. As Dunkin explains, this diagram shows the average cost index for various ages in the general population, based on a large commercial insurance claims database. In general the diagram shows that health care costs increase by age, with the exception of the very youngest ages. The diagram shows the progression of cost in males, based on 2010 data, which has increased compared to the data on males from 2000. Also, the cost among females in 2010 is shown to have increased compared to the date on females from 2000. Arrow one shows the cost, on average, is very high in the first year or two, and drops significantly by age five. At that point, costs are increased modestly through the teen years. Arrow two shows that female costs then begin to accelerate more quickly during child bearing ages and flatten out in the 40s before increasing again. Male costs are relatively flat in the 20s and begin to accelerate after age 30, but remain lower on a per person basis than females in the same age group. Arrow 3 points at the occurrence of the crossover age which is in the early 60s when per capita spending for males exceeds that for females. Arrow 4 shows the general trend of increasing healthcare costs over the lifespan of the population. This diagram shows the average cost index for 65 plus ages in the general population based on Medicare insurance claims excludes private and Medicaid financed long term care. In general, the diagram shows that health care costs increased by age arrow one with the exception of the 90 plus elderly females arrow two. Males continue to have higher cost than females. This bar chart shows the percentage of health care expenses incurred by the top 5% spender's within each age group. As shown, the highest 5% utilizers are in the 65 to 79 year old group with almost a third of the overall total cost. The lowest 5% utilizers are in the less than 18 years old group, which totals only 5% of the overall total cost. The medical expenditure panel survey, MEPS, of 2002, is used to generate this diagram. The agency for healthcare research and quality, HHRQ, describes MEPS as a set of large scale surveys of families and individuals, their medical providers, across the United States. MEPS is the most complete source of data on the cost and use of healthcare and health insurance coverage. This bar chart shows the age distribution of low, high, and very high spending groups. As depicted by the left bar, the lowest 50% of spenders have a higher composition of lower age bands such as 0 to 18 and 19 to 34 year olds. The middle bar shows that the top 5% spenders have a larger share of higher age bands, such as 55 to 64 and 65 to 74. The last bar indicates that the top 1% of spenders are mainly from 55 to 64, 65 to 74, and 75 plus age bands. Arrow one shows the increasing amount of 75 plus age group members in the top utilization groups. While arrow two shows a reversed trend for the 0 to 18 age group. This chart is construction on the 2009 medical expenditure panel survey data. This table shows the relative cost of each member per year by age and sex. The data are tabulated based on a large commercial claims database. Arrow 1 indicates that the cost of both males and females increases as cost goes up. Arrow 2 shows the increased cost of females in certain age groups. The total average cost for the entire population, regardless of age and sex, is calculated as $3,090 per year. This table shows the relative risk of each member per year by age and sex, the data are tabulated based on a large commercial claims database. Note that the total average of risk for each sex, male or female, is not the sum of the risks per age band because the distribution of population is not equal across those age bands. The total average risk for the entire population regardless of their age and sex is one, referring to the $3,090 cost per year in the previous slide. This table lists the population-wide risk factors, calculated for each age and sex combination, as calculated in the previous slide. And applies them to a given insurance plan, which in this case is a small employer plan of 88 males and 89 females. The table shows how custom weighted numbers are calculated for this specific plan as the ratio of age spans and sexes is different than the larger data base used to create the population wide risk factor. For example, there are 4 males and 12 females in the less than 19 age group. Multiplying the number of males by the male risk factor which was calculated in the previous slide. And then adding it to the similarly calculated number for females will provide the overall weighted number for that age group which in this case is 7.12. Again, remember that the actual risk factors for males and females, were calculated in a previous slide. And were based on a much larger commercial database with a fair representation of perhaps the entire US population. While the total number of males and females listed in this table only represent the distribution of sexes and ages in this smaller plan, which only has 88 males and 89 females. The total relative age and sex risk factors for this smaller plan is 0.94 which differ slightly from the total relative age and sex risk factors of the large population, that is set at 1. See previous slide. As discussed, this difference is due to the slightly different ratio of patients in each age and sex combination in this plan. Compared to the larger commercial database, which is considered the population. An overall risk factor of 0.94 which is lower than one, means that this population will probably have a lower cost next year compared to the general population. However, this is just a probability that was calculated based on age and sex and this probability can be inaccurate. Note that the calculations assume that nobody was added or left the plan throughout the year. This table lists a number of employer-based health care plans, and calculates the overall risk factor for each of them, using the method shown in the previous slide. For example, employer plan 1 has 73 employees and their current weighted risk factor ratio is calculated at 138%. Which means that their average cost will be 138% of the average cost of the larger population. In other words, this is a 38% increase in the average cost compared to the larger population. The risk factor can be calculated for year zero and year one. The former is the base here and the latter is considered the future year which in this case is simply next year. Based on the risk factor ratio, the predicted cost for the future year, which is year one is determined to be close to $4,853. However, it turns out that the actual cost next year will be $23,902. This is a large underestimation of $19,049 by the model. This underestimation can be represented as a minus 392% prediction versus actual cost difference. Although this is a very high difference for employer 1, this difference becomes smaller in other cases, such as employer 4. Indeed, employer 4s difference in predicted versus actual cost is only 0.2%, which means the predicted cost is very close to the actual cost. More often, the larger differences occur in plans that have lower numbers of patients, such as employer one employer three, and employer six. The more populated plans often represent a fairer population distribution and thus age sex models will be able to predict with higher precisions. Note that these calculations assume that nobody was added or left from each of the population groups in the employer based plans. This concludes Lecture a of Population Health IT and Data Systems. This lecture introduced data types that are used for population health analysis and common data types for population health analytics. The lecture focused on demographics, mainly age and sex as the first set of common data types. We also discussed distribution of healthcare costs based on age and sex. Various tables showed the basic prediction of cost in a smaller health plan based on the distribution of age, sex, and cost of a larger population.