Welcome to Population Health, Population Health IT and Data Systems. This is Lecture b. This component, Population Health, discusses the application of informatics and informatics methods in Population Health Management. This unit, Population Health IT and data systems, explains the challenges and opportunities of using different data types, data sources, and data systems for population health IT. The objective for this lecture is to, identify various data types and data sources used for population health management, including both traditional and nontraditional data sources. This lecture continues the discussion of the common data types used for population health and focuses on diagnoses, medications, and procedures. As the next set of data types commonly used in population health analytics. Common data types used for population health data systems and analytics include, demographics information such as age, sex, and gender. Diagnostic information, such as the actual diagnosis and severity of a diagnosis. Medication information, such as prescriptions, dispenses, and filled medications. Procedures, such as medical evaluations, aesthetic procedures, surgeries, medical imaging, and other procedures occurred in inpatient and outpatient settings. Data collected by patient surveys, such as the Health Risk Assessment, HRA, and the Patient Health Questionnaire version 9, PHQ-9. Which is designed to measure the severity of depression. Utilization information such as cost, hospitalization, admission to Readmission and so on. And, finally, a set of derive variables to categorise other variables into meaningful groups. These grouper variables are often generated by a variety of commercial and non-commercial applications. There is a long list of grouper data types, including advanced clinical groups, ACGs, diagnosis-related groups, DRGs, and others. In this lecture, we will focus on diagnosis, medication, and procedure data types. Diagnosis data types are commonly used in population health analytics. The common variables comprise various aspects of diagnosis including the underlying disease, signs, symptoms, injuries, factors influencing health status and even family history. Certain diagnoses and various combinations of them, especially chronic conditions, have direct correlation with healthcare utilization. Indeed, the higher number of chronic conditions in a co-morbidity results in increased probabilities of higher healthcare utilization rates. Thus, diagnosis is commonly used as an independent variable to predict various population health outcomes, such as future cost. Derived variables from diagnoses may include severity of a disease, trajectory of a disease over time, and history of a condition that could be extracted from a problem list. Also, certain variables such as medications and lab values are sometimes used to induce missing diagnosis codes. For example, consistent high levels of hemoglobin, A1C, can be an indication of a diabetes diagnosis if the diagnosis is missing. There are a number of commonly used coding standards for diagnosis. These coding standards include, but are not limited to, International Classification of Diseases, (ICD), International Classification of Primary Care, (ICPC), Systematized Nomenclature of Medicine, (SNOMED) Diagnostic and Statistical Manual of Mental Disorders, (DSM), and Read codes. ICD is the most commonly used diagnostic coding system, and is used in both claims and electronic health records. Most patient-level or patient-centered data sources include diagnosis. For example, diagnosis codes are included in both insurance claims and electronic health records, (EHRs). The quality of diagnosis data is often acceptable due to various mandates to collect them accurately, however, data quality differs across different data sources. For example, insurance claims transactions often include a limit on the number of diagnoses that are associated with certain procedures. While EHRs do not have such limitations. In addition, EHRs include problem lists that can be used to determine whether or not a diagnosis is still active, while this is not always possible with insurance claims. There are some interoperability issues with diagnosis data. Specifically when mapping diagnostic data from one coding system to another. This challenge also exists when mapping a diagnosis across different versions of the same coding system. Such as mapping diagnostic data from ICD version 9 to ICD version 10. Certain diagnostic codes, such as HIV information and mental illness diagnoses, are protected by various federal and state laws. Thus a population health database may not include those diagnoses codes. This table shows a list of patients and their diagnoses. The table includes synthetic data and is not normalized meaning that only 9 diagnoses can be entered for each patient's visit. As shown, the diagnoses are encoded in ICD version 9. However, the decimal period used in ICD codes is omitted for data management purposes. For example, ICD Code 997.49 is now encoded as 99749. Arrow 1 points to one of the rows that includes the patient's unique ID. A visit date for the patient, and up to 9 diagnostic codes related to that visit. The second diagnoses of this patient includes an ICD code that starts with an E which indicates an injury code. Arrow 2 indicates another row for another patient visit which includes a V code for the sixth diagnoses. V codes refer to factors influencing health status. Arrow 3 shows that not always 9 diagnoses are available for one visit, thus creating empty cells which are denoted by null in data sets. Arrow 4 shows that only three diagnoses were evident at the time of visit. There are interoperability changes with diagnosis codes. Mapping diagnosis codes from one coding system to another is a major effort. Even mapping diagnosis codes from one version of the same coding system to another version might be complex. For example, as shown in this diagram, converting ICD-9 codes to ICD-10 might generate a one to one mapping. Or a one to many mapping. The top row shows a one to one mapping, where an ICD9 code is mapped to a single ICD10 code. However, the second and the third rows show ICD9 codes that are mapped to 3, and 16 ICD10 codes respectively. And finally, the last row shows an ICD9 code that could be mapped to more than 2,530 ICD10 codes. These coding and interoperability issues will limit the ability of data managers and population health analysts to merge various population health data sources. That use different diagnosis coding systems, or even different diagnosis coding versions. The trend of chronic diseases is upward. The left diagram shows the increasing prevalence of chronic diseases in the US. As shown, the number of Americans with a chronic disease has increased from 118 million in 1995 to 141 million in 2010. It is expected that this number will go as high as 171 million by 2030. The right chart shows the percentage of Americans who suffer from multiple chronic conditions. As depicted, more than 14% of patients with a chronic disease have more than 3 chronic co-morbidities. Note that the number of chronic co-morbidities often has a close correlation with future healthcare costs. Thus, the patients who fall on the right side of this chart are often considered high-risk for utilization. This bar chart shows the economic impact of various chronic diseases in the US. The numbers are based on 2003 expenses and are in billions. Each bar shows the total treatment expenditure, calculated based on insurance claims, and the total lost economic output, which extrapolates the effect of these chronic conditions on other aspects of life. For example, the total treatment expenditure of health diseases in 2003 was $65 billion. While the lost economic output was close to a $105 billion. Note that the treatment expenditures for individuals in nursing homes and prisons, or under other institutional care are not included in these calculations. Treatment expenditures for co-morbidities and secondary effects of listed diseases are also excluded. Population health analytics often focus on the total treatment expenditures, as it is directly measured using insurance claims This table shows the impact of adding diagnosis to age and sex. To better predict future healthcare utilization. The table includes the data from nine patients, with different ages, sexes, and diagnoses. Arrow one points to three cases. A 45-year old male with diabetes, a 40-year old male with diabetes and also a heart condition, and a 40-year old female with a heart condition. Arrows two and three show that if we use the age and sex only model the predictions will be considerably lower or higher than the actual cost. For example, the predicted future cost of the first patient, which is calculated from the age and sex only model, is set at $2,547, while the actual cost has turned out to be $5,024. This shows that other factors, and perhaps most importantly, the diagnosis of the patient are causing this difference that is close to a 197% of the actual cost. The second patient has a much higher actual cost to predicted cost ratio. This large difference might be due to the fact that the patient has two chronic conditions. Diabetes and a heart condition. The third patient also has a high ratio of actual to predicted cost. This ratio is considerably higher than the first patient, despite the fact that this patient has only one chronic condition as well. Thus suggesting that various chronic conditions may have a different effect rate on cost. This table displays the application of condition based modeling in predicting healthcare utilization. A total of nine employer based plans are shown in the table. Each employer plan covers a number of lives. For example, employer plan 1 includes 73 members. While employer plan 2 includes 478 members. Each employer plan also includes an overall condition based relative risk, which shows the effects of mixed conditions in that particular population of members. By using the condition based relative risk factors calculated and assigned to each employer plan, the predicted cost gets much closer to the actual cost in year one. Circle one shows that the ratio of actual versus predicted cost is getting closer to zero and is much better than age and sex only models. Note that the closer this ratio is to zero, the better the model is for predicting the actual cost. Circle two shows the total difference of actual versus predicted dollars across all employer plans, which is also lower than the age and sex only model described earlier. Note that the condition-based relative risk factors are calculated using a commercial software. This table shows the impact of severity on diagnosis-based models. The table lists a number of ICD codes that are related to diabetes. The 250 range of ICD codes indicates a case of diabetes. While others indicate another condition caused by diabetes. As shown in the table, the average cost per year of various diabetes, diagnosis is different due to different complications, severity levels, and potential outcomes. Circle one marks a diagnosis of diabetes, with other specified manifestations, which has a high relative cost of almost 250% compared to the rest of the patients with diabetes. On the other hand, circle two points to the diabetes without complication category that has an 85% relative cost. Circle three shows the overall cost of patients with diabetes, which is much higher than the cost of the general population in the analysis, and is set at $3,090. This means that the average member with diabetes incurs costs about four times the average for all members of the population. And within the members with diabetes, the ones who have diabetes with other specified manifestations cost about ten times the general population. As discussed, various ICD codes may inherently represent a higher or lower severity level, that may affect the relative cost of members with such diagnoses. These disease severity levels, can also be grouped into larger categories for ease of use. For example, this table shows five levels of diabetes severity. Each level includes a number of diagnosis codes and an associated average and relative cost. Often, to better differentiate diagnosis severity levels in a given population, the analytic models use algorithmic approaches to find specific cohorts of the population with the same risk level. This process is sometimes called phenotyping. This table show the algorithmic approach of identifying patients with diabetes mellitus, who have undergone certain procedures, while excluding a certain category of patients with diabetes. Most of the commercial population health analytic programs have internal rule sets that define population cohorts, while analyzing and predicting healthcare utilization rates. Sometimes these algorithmic rules can be tweaked and customized by the end user. Medication data types are commonly used in population health analytics. The common variables in this data type include prescribed and filled medications. Prescribed information is often extracted from EHRs, while filled information is usually found in insurance claims. Certain medications, and sometimes various combinations of them, have direct correlation with healthcare utilization. Indeed a higher number of medications used by a patient often results in increased probabilities of higher healthcare utilization rates. And sometimes, certain medication have an overall higher cost, such as chemotherapy drugs or personalized medications. Medication data type is commonly used as an independent variable to predict population health outcomes. Such as future health care cost. A number of variables can be derived from medication data, such as medication adherence and reconciliation rates. Medication adherence can be identified simply as polypharmacy counts, or as a more complex medication regimen complexity index, MRCI, and medication possession ratio, MPR. Sometimes, analysts use medication data to input a missing diagnosis or severity levels of certain diseases. For example, a change of oral medication to insulin injections may reveal an increase of the severity of diabetes for a given patient. There are a number of commonly used coding standards for medications. These coding standards include but are not limited to, national drug codes, NDC's, which are unique ten digit, three segment, numeric identifiers assigned to each medication listed under the US Federal Food Drug and Cosmetic Act. NDCs are maintained by FDA and are updated frequently. RxNorm, which is produced and maintained by the National Library of Medicine, NLM, is described as a normalized naming system for generic and branded drugs. As NLM explains in their overview of RxNorm, it is also used as a quote, tool for supporting semantic interoperation between drug terminologies and pharmacy knowledge base systems, end quote. SNOMED's Chemical Axis, C-axis, which includes a list of drugs, The Anatomical Therapeutic Chemical Classification System, ATC, which is used for the classification of active ingredients of drugs according to the organ or system, and is controlled by the World Health Organization. And a number of commercial drug codes such as MediSpan, Multum, Generic Product Identifier or GPI, First Databank or FDD and others. Most patient level, or patient centered data sources include medication data. For example, medication codes are included in both insurance claims and EHRs. However, medication data can be extracted from other sources, such as, prescription benefit management, PBM data, commercial electronic medication order systems, such as, Surescript or specialized health information exchange data sets, such as prescription drug monitoring programs, PDMP. The quality of medication data is often acceptable due to various mandates to collect medication data accurately. However, data quality differs across different data sources. For example, insurance claims transactions often capture whether a medication is filled or not while EHRs collect prescription information only. There are some interoperability issues with medication data, specifically when mapping medication data from one coding system to another. For example, in RxNorm may map with various NDC codes and vice versa. Certain medication, such as medication used to treat HIV or mental illnesses, are protected by various federal and state laws. Therefore a population health database may not include those medication codes. This table shows a list of patients and their medications based on claims data. The table includes synthetic data and is transactional, meaning that each patient has a separate row for each of the medications that is filled. As shown, the medications are encoded in NDC codes which contain the product name, medications form, root, strength, unit and even package information. Arrow one points to the column that includes the product's name. As noted by the box, there are different NDC codes for the same product name due to packing difference or other differences such as strength, form or unit. Arrow 2 indicates the package description of the medication that has been dispensed. This information is unique to NDCs, and might not be available in other coding terminologies. Mapping medication codes is a complex process, and can be done on various levels or concepts. This diagram shows the potential use of the RxNorm coding standard to map various classification systems with different source codes as described earlier. The medication coding systems include various terminologies such as NDC, SNOMED ATC, GPI, FDB, and others. This table presents the top ten medications billed by Medicare patients in 2013. Note that this table only shows the filled medications and does not include medications that are not covered by Medicare, such as most over the counter OTC medications. As shown by arrow 1, Lisinopril had the highest number of claims. With a total claim count of more than 36 million dispenses for more than 7 million Medicare patients. The total drug cost however, does not always correlate directly with the number of dispenses due to different pricing for medications. For example, arrow 2 shows that Atorvastatin Calcium had lower total counts compared to Lisinopril, but the total drug cost was close to $1 billion, which is triple the amount of Lisinopril. This table shows the top ten drugs with the highest total cost paid by the Centers for Medicare and Medicaid Services, CMS for Medicare beneficiaries in 2013. For example, Nexium's total cost was slightly more than $2.5 billion. Note that some medications, such as Revlimid, have a much higher per claim cost. Indeed, Revlimid's total cost of $1.3 billion amounts to only 153,000 claims, thus indicating the high price of the medication. These high priced medications can help better predict individual members costs, while the higher claim medication can be more useful to predict certain costs over a larger population. This diagram shows that various medication data sources provide different derived variables. At the point of care when a physician prescribes a medication for a patient, the prescription information is stored in an EHR system. The medication prescription information can then be used to calculate the MRCI for individual patients, or a subgroup of the population. When the patient fills the prescription at the pharmacy, an insurance claim is generated. The filled medication information can then be used to measure the MPR for individual patients or a subgroup of the population. And when a clinician reconciles the medication records with a patient, the medication reconciliation record is often stored in an EHR system. The medication reconciliation information can be used to measure the actual adherence of the patient to the medication regime. Procedure data types are often used in population health analytics. The main variables in this data type are the clinical and administrative procedures such as evaluation and management procedures. As well as, various procedures for surgery, radiology, pathology, laboratory, and other clinical domains. Certain procedures have an overall higher cost associated with them. Thus, creating a direct correlation with health care utilization. In addition, certain procedures may indicate a potential for higher cost in the future. Procedure data type is commonly used as an independent variable to predict population health outcomes such as future healthcare cost. A number of variables can be derived from medication data. Such as missing diagnosis or identifying the severity level of a given condition. There are a number of commonly used coding standards for procedures. These coding standards include, but are not limited to, International Classification of Diseases Clinical Modification, ICD-CM version 9. International Classification of Diseases Procedure Coding System, ICD-PCD version 10. Current procedural terminology, CPT, which is copyrighted by the American Medical Association, AMA. Healthcare Common Procedure Coding System, HCPCS, which is managed by CMS. Most patient level or patient centered data sources include procedure data. For example, procedure codes are included in both insurance claims and EHRs. Although there are some differences in how certain procedures are encoded in each data source. The quality of procedure data is often acceptable due to various mandates to collect them accurately, however data quality differs across different data sources. There are some interoperability issues with procedure data, specifically when mapping procedure data from one coding system to another. Certain procedures, such as procedures used to treat mental illnesses, are protected by various federal and state laws. Therefore a population health database may not include those procedure codes. This table shows a list of patients and their procedures based on claims data. The table includes synthetic data and is transactional. Meaning that each patient has a separate row for each of the rendered procedures. The procedures are encoded in CPT. This table is sorted based on patient ID, to show the list of procedures for a given patient. As shown by arrow one, one of the patients has had five different procedures over a span of almost six months. This concludes lecture b of Population Health IT and Data Systems. This lecture covered common data types for population health analytics, focusing on diagnosis, medication, and procedure data types. The diagnosis data type included a discussion on the trends of chronic diseases, and the relationship of diagnosis and severity with healthcare costs. The medication data type included a discussion of the cost of major prescribed and filled medications and the variation of medication data sources.