Billing data. Things like CPT procedure codes and ICD diagnosis codes are some of the most common data types used in computational phenotyping. In this video, we will cover the relative benefits and drawbacks of using billing data for this purpose. Remember that billing data is assigned for each healthcare encounter. Procedure codes describe what was done in the encounter, what diagnosis codes provide the reason for the procedure? Let's start with diagnosis codes. In computational phenotyping, we typically want to identify patients with a particular disease. The structure nature of ICD codes makes these data easy to access and relatively straightforward to use in analytics. This is why they are the most common data types used in computational phenotyping. But ICD codes have variable performance in phenotyping depending on the type of code. For example, codes for common diseases like diabetes tend to be very sensitive. They capture a lot of diabetic patients, but they're not particularly specific. You'll often see type one and type two diabetes codes used interchangeably regardless of the type of diabetes the patient actually has. Other codes like tobacco use disorder are highly specific. Only people who have used tobacco have the code. But they're not very sensitive. Only about 30 percent of patients who have used tobacco actually have the code. Some codes are likely both specific and sensitive. Our ICD code for struck by turtle is likely one of these codes. But I don't imagine you'll use that much as a clinical data scientist. False positives from ICD codes can occur for a variety of reasons. First, diagnosis evolve over time. As a provider is diagnosing the patient, they will bill for the suspected diagnosis and then revise this testing identifies the real diagnosis. This is especially common in auto-immune traits that have similar symptoms. As patients get more and more of the same diagnosis code, you can be more confident the patient likely has that condition. Second, diagnoses codes may be entered in error. Primary care providers assign their own billing codes. Historically, providers would have a few codes memorized that were most frequently used by their patients. Nowadays, we have a list that a provider can search to assign the code but a wrong click or an autocomplete can trigger an incorrect code. Third and finally, the wrong diagnosis code may actually be entered on purpose. Remember that these codes are used for health care billing and reimbursement. There are a number of medications tests and procedures that are only allowed for certain diagnoses. Providers may assign one of these codes to justify unnecessary procedure so that it is covered by insurance. One that I have seen is men with a female specific breast cancer code. These men had breast cancer and needed a particular type of radiology scan that was only allowed in combination with a female breast cancer code. In addition to these false positives, diagnosis codes may also have false negatives like with the tobacco use code. Outpatient visits are limited to only four codes. So only the primary reason for the visit will be recorded. This means that patients who have a number of health conditions may not always have all of their conditions recorded. Other factors that can affect code accuracy is the impact of professional billers on inpatient diagnosis codes. These individuals understand which codes will get the highest reimbursement rate. I may choose codes based on that reimbursement value rather than the most accurate description. Note that these codes have to be supported by the medical documentation. So, they aren't necessarily wrong, but they may be used differently than in outpatient encounters. Procedure codes, on the other hand, are very accurate. They are rarely used improperly. In the context of computational phenotyping, broad codes like outpatient visits are not very useful. However, if there is a specific procedure that is only used to treat a single condition, that procedure code can be a very specific way of identifying that patient population. A great example is an intracranial aneurysms. If a patient has a code for the clipping of an aneurysm, they almost certainly have an aneurysm. However, procedure codes only apply to patients who received the procedure at your health care system. So they are not very sensitive. Also, these codes may be assigned by the billing service for the physician. At a number of hospitals, physician billing is actually handled by a third party and the EHR never gets a record of these procedure codes. In summary, billing codes, while extremely useful because of their structured format, have variable quality. You should always examine the specific performance of any diagnosis code before assuming that it accurately identifies the condition specified.