In that extended example we saw the value of doing good analytics with these performance measures. What first appeared to be significant differences in skill turned out to be purely chance. And, that really emphasises the importance of persistence. You need to find ways of testing for persistence. So, in that case, we just looked at split sample. How performance varied in one year, and then how it varied in the next year. Finding ways to do that is one of the most fundamental ways you can parse signal from noise. We're gonna focus on four additional issues for the rest of the module. Regression to the mean, sample size, signal independence, and process versus outcome. These are all important concepts to have in mind as you dig into your data. And they are also issues that we tend to have if we only reason about data intuitively. So there are issues that data can improve, analytics can improve. But analytics aren't apparent, you can still make these mistakes, even with data. So let's drop into them. The first is regression to the mean, and want to start that with a very simple model of performance where you can think of performance terms of real tendency plus luck, and we've been talking about this a little bit. We can formalize it, and don't get too put off by the baby math here but in formal terms. You can think of performance, y as a function of x true ability and e some error, some randomly distributed error around 0. Now what does that mean for when we sample on extreme performance? What underlies extreme success and failure? What is that, if that's the model of the world and everything that we've been saying so far says it is, that there's some noise in these performance measures, what does it mean when we sample on extreme performance? Well it means that extreme success suggests that the person might in fact have superior ability or tried very hard but also that they got lucky. The error was positive. And conversely extreme failure perhaps means inferior ability or that they did not try very ord, but also negative era or that they got unlucky. We can be sure that as we sample very extremely on performance measure. A noisy performance measure, and they're all noisy we can be sure that when they go to the extremes we get extreme error as well. So what are the consequences of that? There's one very important consequence and that is in subsequence periods, error won't be negative again. It will regress to the mean. You'd expect it to be zero. Error is by definition zero. And if you've got very positive error in one period, you would expect less error in the following period. This is a notion called regression to the mean and it's one of the most important notions in performance evaluation. So, an example, there was a study a few years ago of mutual fund performance in the 1990s. So the study divided the 1990s into two halves, 1990-1994 and then 1995-1999 and they looked at the top ten performing funds from the first half of the decade and here I'll show them to you, we anonimized them, this is just supposed funds a through j, and their performance in the early 1990s. There were 283 funds in this study, these are only the top 10 performing funds. Then they did two things they go and ask how do these funds perform in subsequent years, and they did an interesting thing in between, they ask people what do they predict happened in the new few years. What do they think performance will be realized in the second half of the decade. So here are the predictions, the estimations, from the people that they asked. They didn't think the top performing firm, A would again be the top performing firm, they though maybe 10th. And so on down the list. E, which is the 5th performing firm they thought well, maybe 44th. And so you can see that they didn't expect the firms to be as good, but they expected some regression to the mean. Then they looked at what actually happened. What actually happened? It ranged from 129th, 21st, 54th. The interesting thing is that on average, the firms performed, their rank was 142.5. What is the significance of 142.5? It's half of the total number of firms in the study. In other words, the average performance of the top ten firms in the second period, the second half of the 90s, was perfectly average for this sample. They regressed entirely. The top ten mutual funds in the top half of the 90s, the early 90s, regressed entirely to the mean in the second half of the 90s. If that's the case, what does that say about how much skill versus luck was involved in how those firms did, in the first half of the 90s? If they regress all the way to the mean in the second period, it suggests that there was no skill. That the differences that we saw, and there are huge consequences to those differences because we know that new fund flow to successful funds, were in fact entirely based on luck. So, there are many other examples. Danny Conium the Nobel prize winner Danny Conium, it gives a famous example of being an officer in the Israeli Air Force. He was studying the officer in the Israeli Air Force. This was early in Common's career. And the officer told him, "Punishment is more effective than praise. Whenever I punish a pilot after a really poor flight, I see better performance the next time. Whenever I praise a pilot after an excellent flight, I see worse performance the next time. Therefore, it must be that punishment is more effective than praise. What's a more parsimonious explanation? The more parts explanations that there's a little chance involved with whether a pilot has a good performance a good flight, or a bad flight. And after a good flight if there's some chance involved there you would expect that the following flight wouldn't be as good and conversely. After a bad flight, if there is some chance involved, you would expect, you would predict that the next flight, would on average be better. This is exactly why we have to be so careful about regression to the mean. We have the wrong model of the world if we don't appreciate regression to the mean. We walk around like the Israeli Airforce Officer who believe that it was all about praise and punishment, as opposed to merely statistical regression to the mean. There's another example. We're not gonna pick on Israeli Air Force officers. One comes from Tom Peters, from the original business book. Peters and Waterman were McKinsey consultants, no less. And they did a study. And it began as an internal study. And they eventually published it as a hugely best selling book on what determines excellence in companies. They selected 43 high performing firms and tried to learn what they could about business practices from these top 43 firms. But subsequently if you folks evaluated the performance of those 43 firms and what do they find? Five years later, there were still some excellent companies and there were some that were solid but not exactly the top of their industries. And then there were quite a few in weakened positions and there were even some from the supposed 43 excellent companies who were fully troubled. Now this is exactly what you'd expect from regression to the mean and that suggests that sample that Peters and Waterman had grabbed as supposedly excellent firms. They grabbed them perhaps they were on average a little bit better, but they had necessarily been lucky to make it into that sample. To be called the most successful of 43 firms in the world essentially. They were necessarily lucky and in subsequent periods, they're not gonna have luck break their direction. So this is something that you'll see any time you sample based on extreme values. That if you sample on one attribute, any other attribute that's not perfectly related will tend to be closer to the mean value. So we've been talking about performance at points in time. If you sample on extreme performance at one time period. The subsequent time period won't be as extreme, whether you sample an extremely good or extremely bad. But it can also be attributes within an individual or within an organization. If you sample, say, a person's running speed, and then look at what their language ability is. These things are imperfectly related, right? So if you ever looked at the fastest runners how would you expect them to perform on some language ability test? They wouldn't be as high. The fastest runners will, almost by definition, will not necessarily be the people with the best language ability. But that's not because there's some inverse relationship between language and running ability. It's that these two traits are simply imperfectly correlated, and so when you sample on the extreme you have to expect regression to the mean on any other attribute. So we could spend a day on regression to the mean effect. There aren't many concepts that are more important in understanding the world, than regression to the mean. We could spend hours on this. And I would be very happy if you walked away from this course with only two or three ideas, if this is one of them. Because it's gonna help your reasoning about the world. Why is this so hard? Why is this such a hard concept to stay? To live? Well there are a few things that get in the way. Among others, we have this outcome bias. I mentioned it with Hershey and Barron, referenced Hershey and Barron. They're the ones that came up with this study originally. We tend to believe that good things happen to people who worked hard. Bad things happen to people that worked badly. And we draw too strong an inference based on this. We tend to judge decisions and people by outcomes and not by process. This is a real problem and it gets in the way of our understanding of this regression to the mean framework for the world. Two others. One is hindsight bias. Once we've seen something occur we have a hard time appreciating that we didn't anticipate it occurring. We, in fact, often misbelieve that we anticipated that that's exactly what would happen. We show hindsight bias and again, if that's the way we reason about what happens, then we're not gonna appreciate that what happens next could possibly be just regression to the mean. And finally narrative seeking. We want to make sense of the world, we want to connect the dots, we came to believe things better we can tell a causal story between what took place at time one and what took place at time two. And if we can tell a causal story, then we actually have a great confidence in our ability to predict what happens next. We seek these stories as opposed to what I've been telling you which is this dry, statistical reason for why things happen. We seek stories. And that again gets in the way of our understanding of the statistical processes that actually drive what's going on. So in short we make sense of the past. We are sense making animals and we make sense of the past. And there's not a lot sense to be had for merely regression to the mean, but it's going to get in our way of predicting what happens next. We try to find stories that connect all the dots. And we, by doing that, give chance too small a role in those stories. So there was an internet mean that captures this well a year or two ago where, if this is knowledge distributed in your experience in the past. This is knowledge you might have. And with that knowledge, perhaps you can add some experience and start connecting the dots, drawing some lines, create something from that knowledge. That's good. That's what we want experience to do. But then, sometimes we're inclined to do this which is get a little too creative and over fit those lines, and we turn what should be a pretty straight grid, pretty parsimonious connections into something that is unlikely to replicate in the future. It might be a very satisfying interpretation of the past but it's over fit, and an over fit interpretation of the past is going to make very bad predictions about the future.