We've already started to discuss some of the basic concepts related to machine learning. And how it was different from traditional computer programming earlier. Now we're going to dlve more deeply into that definition and introduce several techniques, as well as relevant data types. And use cases that will hopefully lead to a working knowledge for you in subsequent discussions. In other words, it's now time to talk about the jargon of machine learning. It'll feel a bit like learning a new language, but even after only a short time relating to the terms, concepts and principles. It's hopefully be possible for you to confidently move forward with pursuing your machine learning and healthcare journey on your own. This will hopefully mean that you'll be well prepared to identify areas of interest. And seek out more resources on your own that will deepen your knowledge of these concepts and principles. Machine learning is endlessly fascinating and constantly evolving just like healthcare, and both are lifelong learning endeavours for everyone. Don't worry if this seems like a lot coming at you fast, especially if you're new to this. We'll make sure that you hear these terms and principles many times and from lots of different angles throughout the next several conversations. Bottom line is don't panic. As we discussed earlier, machine learning is a family of statistical and mathematical modelling techniques that uses a variety of approaches to automatically learn. And improve the prediction of a target objective without explicit programming. And even more concise way of thinking about this is systems that improve their performance in a given task through exposure to experience or data. Often people talk about machine learning as having three paradigms, supervised, unsupervised, and reinforcement learning. Each of these approaches can address different needs within healthcare. We'll define these terms soon, but the important thing to know for now is that it's more accurate to describe machine learning problems as falling along a spectrum of supervision between these terms. For the sake of simplicity, we'll focus primarily on supervised learning. But it's good to recognize that other paradigms also exist and will in some ways fall on the spectrum. >> A useful way to think about what we're going to use in computer science here boils down to three components. There's an input, some kind of processing and an output. Using a simple example, we can take an equation I told you about the equations earlier. This one's easy, y equals X squared. In this case, the input is x and there's an output y. Or another example might be abnormality detection, where the input is an ECG. And the output is a medical diagnosis like ST elevation myocardial infarction. For both these examples, in between the input and the output, there is something that processes the input to produce the output. So, in the first example, that simple equation, it is squaring the input x to arrive at the output y. In the ECG example, there is a visual analysis being performed on the ECG, leading to that output diagnosis. In both cases that processing we are referring to is the thing that transforms the input into the output and that's called a function. In a traditional computer programming approach, we deliberately write rules to process the inputs. So, that they produce the desired outputs by hand. This is why traditional computer programming is also referred to as a rules based approach. In other words, computer programmers write functions with specific rules. And the program that is written is the function or is the thing doing the processing to achieve an output. So, if we were to write a program for the first example above, the program would ask for an input x, then square the input and return the new output y. The function here is the part that squares the input. The same thing would need to happen if we wanted to take a traditional computer programming approach to the ECG example, right? But obviously, the function here would be, well, much more complicated than just simply squaring the input. But you can imagine that even without programming experience that you would need to somehow write a program that would process the ECG In a series of very specific steps and decisions. So, that an accurate analysis of ECG can be performed, that resulting program would be the function that would process an ECG and provide an output. In this case, maybe heart attack or no heart attack. And this, by the way is exactly how the early automatic ECG reading machines that were used in hospitals all over the world. Are able to provide a prelim diagnostic analysis. They relied on this traditional computer programming approach like the one we're describing, which require that the function be coded by this laborious rules based approach. In this case due to the complexity, it required lots of coding by programming teams, as well as domain experts to help decide from the medical point of view. And the computer programming view, how to interpret the diagnostic decisions and put that into a function. >> In contrast, in a machine learning and in particular a supervised learning approach. We don't know exactly how to process inputs to produce the right outputs. But we have a lot of data that links inputs to the known corresponding outputs. The program written in this type of approach now searches for or in other words, learns or finds a function that can accurately map our data inputs to the outputs. We can then use this function to process new inputs and produce new and hopefully correct outputs. So in our earlier example of ECG diagnosis, we explained how difficult it would be to write a function that would process ECG input data to accurately give a diagnosis as an output. The inputs in this case are complex and vary from patient to patient. And for the most part, the traditional ECG automated diagnosis systems aren't terribly reliable. Or at least they're not used to making final decisions on their own. This is because for the most part, humans haven't been able to hand write us aset of rules that can give us human level performance on these incredibly complex tasks. No matter how sophisticated the function we can write is, in the end, there's still too simplistic and typically too rigid to succeed broadly in the real world. After all, if it were possible to do this efficiently and accurately, there wouldn't be any need for us to be teaching this course. So, in contrast, let's think about how the machine learning approach would work here and why it might be an advantage. Well, the function from input to output can be difficult to write down. Very often we do know what the right answer or the output is supposed to be. If a cardiologist has already looked at the ECG, the input and recorded a diagnosis or the output. And we likely have two parts to the equation, all we need to do is figure out the function that solves this input output relationship. Supervised learning is the process through which a program takes input, output pairs and learns the function that links them together. Note that we call this supervised learning because we provide input and output pairs. We're supervising the model by providing it with the right answers. The entity that undergoes supervised learning is commonly called a model. Because it represents or models the relationship between the inputs and the outputs. Learning this relationship means, learning a function which in this case means adjusting a set of numbers known as parameters. A model is defined entirely by its parameters and the operations between them. The field of machine learning is largely dedicated to discovering algorithms and techniques for finding good values of parameters that allow models to match inputs closely with their corresponding outputs. Sometimes you might also hear someone call a model a function approximator. And this is because this is exactly what it does and approximates the function between the inputs and the outputs. Once the program learns a function that works well. We can then use it in place of software that would have been written by traditional computer programming. We can take new inputs, put them through a learned function, and produce new outputs. And this is the ultimate goal of supervised learning. Now, I want to pause here and clarify one important point. In supervised learning as in traditional computer programming, a program still has to be written. However, as we've discussed the purpose of the program to search for to learn an accurate mapping function instead of pre specifying it is fundamentally different. We'll spend some time now discussing what must go into supervised learning programs in order to create a working machine learning model. In our ECG example, we would expect that with enough inputs or ECGs, and outputs or diagnoses, the function can be learned by a model. And we wouldn't have to explicitly program each step of the function by hand. The model would learn to read ECGs like a cardiologist, pretty cool, right? >> So because supervised learning relies on labels the success of any supervised machine learning approach is dependent on the labels being correct. The labels are the output, they are the result of the function acting on the input, makes sense? So, we call it supervised learning again, because we provide the input and correct output labels to the machine learning model as a supervisor who teaches the learning algorithm. The machine learning model learns in a supervised way through many examples of inputs and output labels. How to eventually build a function that can accurately take new data that doesn't have any labels and reproduce labels accurately as output. Again, the labels in this case can be anything in medical condition like diabetes or parkinson's disease, an event like readmission to the hospital or sepsis. And imaging diagnosis like skin cancer from photographs of lesions or pneumonia on chest x rays. And like our example earlier, it can be a heart attack on an ECG. Basically anything that represents a desired output can be a label. The labels represent what our machine learning model should output or tell us given the respective inputs. If we teach the model with examples of what it should do, given a particular input, the idea is that the function will be learned. Since a lot of data is needed to do this very well with these models, we have benefited from this approach now that we have the rapidly expanding availability of medical data. And there have been many successful applications of supervised learning to healthcare. This method of training a model is popular in settings with a clear outcome and in particular, a setting that has large amounts of labeled data. As an example, there was a group from Google and UCSF who worked together to create a machine learning project with supervised learning. To predict mortality, readmission and diagnosis labels at discharge from EHR record data. The resulting model was able to then take new data and subsequently predict the correct labels from that data. In another example, Stanford researchers developed an AI system that classified 14 conditions on a chest x-ray at a performance of a comparative out of practicing radiologists. Based on tens of thousands of input, output example pairs the system learnt which image features were most closely associated with the differential diagnosis just like a human. So, over time and hundreds of thousands of examples of images, which was the input and the diagnosis, which was the output label. The AI model was able to learn an accurate function that performed as well as human experts for reading chest x rays. Again, these sorts of examples did not require any of the laborious rules based programming that we would see in a traditional approach, which would take decades to achieve the same kind of performance. Pretty cool, right?