Welcome to Population Health, Population Health IT and Data Systems. This is lecture d. This component, Population Health discusses the applications of informatics and informatics methods in population heath management. This unit, Population Health IT and Data Systems explains the challenges and opportunities of using different data types, data sources and data systems for Population Health IT. The objective for this lecture is to identify various data types and data sources used for population health management, including both traditional and nontraditional data sources. This lectures discusses emerging data types used for population health. A number of variables have emerged, as potential data types for population health analytics. These emerging data types include lab orders and values, such as hemoglobin A1c levels. Vital signs, such as body mass index, BMI and blood pressure levels. Social data, such as income levels. Patient-generated data, such as mobile health data and patient-reported outcomes, PROs and other data such as workflow data. Lab data are one of the emerging data types used for population health analytics. Lab data can include both lab orders and lab results. Lab orders and sometimes lab results. And more importantly, their trends can be indicative of future healthcare utilization. Thus, lab orders and lab result trends are sometimes used as independent variables to better predict healthcare cost. A number of variables can be derived from lab data, such as severity of diagnosis, missing diagnosis and even potentially missing medications, although such imputations should be conducted cautiously. There a number of well-developed coding standards for lab orders and lab results. These coding standards include, but are not limited to. The Logical Observation Identifiers Names and Codes, LOINC, which is a universal code system for tests, measurements and observations. The Systemized Nomenclature of Medicine, SNOMED and Current Procedural Terminology, CPT. These coding standards are still being adopted and some lab data sources may not adhere to them. Thus, making interoperability a challenge for lab data derive from multiple sources. Note that the majority of electronic health records EHR's, currently use internal coding systems for lab orders and results, which make interoperability of lab data for population health even more challenging. Patient-level clinical data sources often include lab data. Currently, the best sources of lab data are EHRs and other laboratory information systems used by various laboratories. The quality of lab data is often acceptable. However, issues with changing coding standards and units of measurement often affect data quality negatively. There are some interoperability issues with lab data, specifically when mapping lab data from one coding system to another. Certain lab orders and results, such as lab tests revealing HIV status are protected by various federal and state laws. Therefore, a population health database may not include those lab orders or results. This image shows a table returned by the LOINC explorer, which is an online tool to find matching LOINC codes given in entered free text by a user. However, as shown in this table, finding a matching lab test is more complex than expected and requires knowledge about the laboratory processes. Thus, finding the appropriate lab codes across multiple data sources often requires expertise and unexperienced users may increase the potential error in assigning and merging lab orders and results into larger population help databases. This diagram depicts the challenges of mapping the CPT lab codes to the LOINC lab codes. As shown, there are cases of one to one matching. However in most cases, one CPT code maps to more than one LOINC code. The first row of the diagram shows a one to one matching between CPT code 82040 with the LOINC code 1751-7. The second row shows one toxoplasma CPT code matching to three LOINC codes. The next row shows 1 CPT code matching to 11 LOINC codes and the last row shows a 1 to more than 1,017 LOINC code matching. Note the part of the matching problem is due to the fact that CPT codes do not specify the details, as much as the LOINC coding system does. These coding and interoperability issues limit the ability of data mangers and population health analysts to merge various population health data sources that use different lab coding systems. This table shows a list of patients and their lab orders, and results based on synthetic data. This is a transactional table. Meaning that each lab order will create a new row for a patient. Each row includes a unique patient ID, the setting in which the lab test was ordered. Status of the lab order, order date, test date, result date, the order CPT code, lab test name and the lab result. Box one shows that multiple lab tests are ordered for the same patient during three outpatient visits. Arrow two shows that the status of certain lab orders can be cancelled, which means those results need to be excluded and, or ignored. Arrow three indicates that lab results are often challenge to interpret, especially when units are not accompanied. Vital sign data type is one of the emerging data types used for population health analytics. Vital sign data can include a number of physiological and other clinical variables, such as weight, height, body mass index, blood pressure, temperature, pulse rate and respiratory rate. Certain long-term trends in vital signs may be predictive of higher healthcare utilization in the future. Vital signs are used as independent variables to better predict healthcare cost. Vital signs can also be used to impute missing diagnosis or measure the severity of a given diagnosis. However, such derived variables should be treated cautiously. LOINC is the de facto coding standard for vital signs. However, most EHRs do not actively adhere to it. Indeed, most EHR users have developed internal coding lists for their vital signs. Patient-level clinical data sources often include vital signs. Currently, the best sources of vital signs are EHRs. The quality of vital signs is often acceptable. However, issues with human errors and units of measurement often affect data quality and thus require extensive data cleaning before use. There is some interoperability issues with vital signs, specifically when mismatched units are used in different data sources. Another interoperability issue may arise from missing extra information about vital signs that may change the clinical concept and use of the data, such as sitting versus standing for blood pressure measures. Variation and vital signs should be interpreted with caution, as the variation might be due to multiple underlying reasons. Some of these causes can be fixed or in the data cleaning and preparation phases, but some are simply part of the data and can not be corrected. The variations in vital signs can be due to normal variations in human physiology, such as the different levels of normal blood pressure across the population. Subjective bias in reporting the vital signs, such as the same level of pain reported differently by different patients. Measurement errors, such as using unreliable measurement tools or instruments to measure blood pressure. Data capture errors, such as making a mistake while entering blood pressure levels. And data interoperability issues, such as the variations in units and levels of accuracy or blood pressure. This table shows a list of patients and their vital signs based on synthetic data. This is a transactional table. Meaning that each vital sign measure for a given patient is listed on a new row. Arrow one shows the extra information available for a vital sign measurement. Arrow 2 points to the vital sign value, which in this case, shows a weight of 2,800 ounces. Note that due to missing units, the values may be misleading in some cases. Arrow three shows the fact that sometimes a combination of numbers might represent the value of a lab test. Box four shows a number of vital sign records for a given patient over an extended period. For example, in this case, the BMI of the patient has increased from 34.56 to 44.77 over a period of almost 8 months. These vital sign changes may create risk factors for future healthcare utilization. Social data type is one of the emerging data types used for population health analytics. Social data can include a number of variables, such as smoking status, alcohol consumption, addictive behaviors and socioeconomic status, SES. Some social data, such as smoking are indicative of potentially higher utilization in the future. In population health analytics, social data are usually used as independent variables to better predict healthcare cost. Social data can also be used to derive multiple variables, such as treatment affordability and medication adherence. Although a number of coding standards have been proposed to standardize social data, most of the existing social data sources use internally developed coding vocabularies. Person-level data sources, such as EHRs often include some social data. Social data can also be acquired from non health data sources, such as data systems used in social services organizations. Social data often suffer from lower data quality, which is caused by the incompleteness of survey responses and a higher possibility of bias due to subjective bias in responding to surveys. Data interoperability might become challenging, if the same underlying survey has not been used to collect a specific social data type. Note that the nonclinical data may also be subject to non-HIPAA rules, such as the Family Education Rights and Privacy Act, FERPA. This table shows a list of patients and a select list of their social data based on synthetic data. This is a transactional table. Meaning that social data captured in a new visit are listed in a new row. Arrow one shows the description of the smoking status. However, when number of years is missing, it will be complicated to measure the intensity of smoking history. Arrow two shows the yes and no answers to alcohol consumption without having extra information on the intensity of alcohol consumption. Arrow three points at the cells that are often denoted as quote, not asked, end quote for various social data types, especially sensitive ones such as sexual activity. These empty or not as cells often make the use of social data for population health analytic complex. Box four shows the fact that the trend of social data can also be useful. For example, this user has become sexually active after a non active period. Patient-generated data are infrequently used for population health analytics. Patient-generated data can include a number of variables, such as physical activity levels collected by wearable devices, patient-reported signs and symptoms and other passive, active variables reported by patients. Certain trends of patient-generated data may improve short-term predictive models of utilization. In population health analytics, patient-generated data are usually used as independent variables to better predict healthcare costs. Patient-generated data can also be used to derive several variables, such as general fitness and active daily living levels that could be indicative of frailty. Standards are increasingly becoming more available for mobile health and wearable devices, but research grade standards are still not widely adopted. Patient generated data are often collected via personal mobile health and wearable devices. Personal health records or centralized survey platforms. Patient generated data types have a variety of data quality levels. Mobile health and wearable device data often have acceptable data quality levels, but the issue of accuracy and comparability is still a challenge when data is collected across a variety of devices. Self-entered data via surveys and other means are subject to a variety of biases, and errors. Data interoperability might become challenging, as more non-standardized devices enter the market and for devices that are exempt from being monitored by federal agencies. A distributed consent process is often complex and creating large population-wide data warehouses may be constrained by certain legal limitations. As shown by the images on this slide, there's a wide variety of mobile health apps or wearable devices on the market. These apps and devices can generate data types that are deemed useful for population analysis. The challenges to bring together population-wide data streams that can help improve the stratification of a given population. As shown by the images on this slide, a variety of personal health records and portals can also generate data types useful for population analysis. Other potential data types can also be used for population health analytics. These data types include workflow data types, such as care density, type of provider, place of admission/discharge and other workflow related data types. Environmental data, such as geo-bound environmental data mixed with population health outcomes. For example, the ratio of fast food restaurants to fresh food grocery stores can be used for population health analytics. Marketing data, such as shopping behaviors, bankruptcy records and credit scores. Note that ethical and legal questions are still unanswered, if certain data types can or should be used for population health stratification. This concludes lecture d, a Population Health IT and Data Systems. This lecture discussed emerging data types, such as lab orders and values. Vital signs, such as blood pressure and heart rate. Social data, such as smoking status and alcohol consumption. Patient-generated data, such as physical activity levels generated by wearable devices. And other potential data types, such as workflow or marketing data.