So we've covered a lot of ground so far in our overview of clinical machine learning. You should feel like you have a solid understanding of the principles of machine learning, as well as how we build and evaluate them in the laboratory environment. Most of the conceptual foundations are not necessarily new. But the combined advances can finally now translate theoretical models into usable technologies. >> Though we've spent most of the time discussing the upside, or potential to improve healthcare with machine learning, it is equally important to recognize that, like all technologies, practical uses of machine learning in healthcare do not come without certain challenges, or as we like to call them, opportunities. Am I right? [SOUND] All right, so now we will cover some of the important concepts beyond the principles. And what's more, we've curated and distilled some of the best practices and guidelines from experienced groups all over the world. >> Now, let's talk about a simple, but surprisingly challenging concept behind the technology of machine learning and healthcare, correlation and causation. Because machine learning methods, such as neural networks, simply learn associations between the inputs and outputs to find the best fit, regardless of what features it needs, they can sometimes be unreliable, or even dangerous, when used for driving medical decisions. Why is this the case? Well, we've learned that since they're not actually explicitly programmed, machine learning algorithms can and will use whatever signals are available to achieve the best possible performance in the data set used. So we must recognize that the model or function that's created can in many instances come from patterns in the data that are correlations, rather than causative truths about the label, or in other words, about the outcome or the disease. >> So if a model to fit the data is based on useless correlations due to either coincidence, or the presence of unforeseen variables, also referred to in machine learning lingo as common response variables, confounding variables, or lurking variables, it can be problematic for translation into clinical care. Even though the intention, the hope is that these algorithms will learn the most clinically useful and appropriate model features, instead of learning false correlations, well, you've may have heard someone tell you before that hope is not a strategy. The issue that comes up in machine learning time and time again, especially with neural networks, is the situation where the model exploits unexpected or even unknown confounders that have no relevance to the task. And that will severely impair or invalidate the model's ability to generalize to new data sets. >> And this happens, because for the most part, correlation often does work well for machine learning models to accurately classify problems in large datasets and predict outcomes. >> But without medical context, this can lead to attempts to address the predictors as factors that can be manipulated to change outcomes, which will do nothing at best, and cause harm at worst. But certainly will not lead to the intended goal of using machine learning to improve healthcare. >> And we've already mentioned the urban legend of the Russian tank problem, where an AI model recognized the weather in the pictures, and not the tanks. Now, granted, this story has been told in many forms, and has never been proven true. But the concept is not far-fetched at all, even in medical applications, because we also showed you a real example of the same phenomenon in a chest x-ray image example. In that case, the model was considered very accurate. But like the tank problem, was focusing on non-medical cues in the image to draw conclusions. And this issue doesn't come up only in imaging models, but also in models designed with structured EHR data as well. >> For example, there was one group that trained a machine learning model using EHR data to predict the risk of death for pneumonia patients. The hope was that the model could automatically identify patients at high risk who could be treated more aggressively, and also find those at low risk who could be safely treated at home. And then they built their model using historical EHR data, and it worked really well. However, they found later that the model focused in on a key correlation to make accurate predictions that was problematic. In this case, they found the model predicted that pneumonia patients who also had asthma were low risk, suggesting that asthma is somehow a good prognosis for pneumonia. >> But this sounds wrong, of course, especially to our medical audience. It should be clear that there's some problem, and the data should be reexamined immediately, as the risks for bad pneumonia outcomes for asthma patients should be much higher than for non-asthma patients. So was the model wrong? Well, no, wasn't wrong at all. It actually had created a model that worked perfectly on the data that they used to train the model, and the model had passed all the metrics. >> But when they looked into it more, they realized the model was trained on historical data from that institution that did have better outcomes for asthma patients, so that was true. But it was true because of a hospital policy for patients who have asthma that was put in place. And so asthma patients with pneumonia were directly admitted automatically to the intensive care unit, where they received aggressive treatment by protocol. Which, over time, improved pneumonia prognosis for patients who also had asthma in this hospital. So the model didn't know that there was a policy that helped the pneumonia patients get better who also had asthma. It just found that there was a correlation in the data than pneumonia patients with asthma had better outcomes than if they didn't have asthma. >> Fortunately, the research team noticed this confounded finding, and the model was never used in practice. But, imagine that this model was deployed. In that scenario, pneumonia patients who are at a much higher risk of bad outcomes, in particular, patients with asthma, would have been flagged as low risk, and possibly given outpatient treatment, or sent home, instead of being sent to the intensive care unit. >> So historical and electronical medical record data sets in this case, even though they were properly constructed, properly labeled, and passed all the metrics, led to the creation of a machine learning model that ultimately inferred pneumonia outcomes based on trends from a hospital policy, and not based on real world physiologic risks. So was this just another fluke and isolated incident? Actually, not at all. There are many examples from a wide variety of healthcare applications based on models that, if implemented, would be a disaster of epic proportions. >> Imagine all the work that went into developing these models, only to realize that they're not clinically useful. And this is actually a surprisingly common pitfall, despite being an easy concept to grasp in retrospect. We'll cover approaches that can help you identify, and avoid these kinds of problems before they happen.