In that extended example, we saw the value of doing good analytics with these performance measures. What first appeared to be significant differences in skill turned out to be purely chance. That really emphasizes is importance of persistence. We need to find ways of testing for persistence. In that case we just looked at split sample, how performance varied in one year and then how it varied in the next year. Finding ways to do that, is one of the most fundamental ways you can parse signal from noise. We're going to focus on four additional issues for the rest of the module, regression to the mean, sample size, signal independence and process versus outcome. These are all important concepts to have in mind as you dig into your data, and they're also issues that we tend to have if we only reason about data intuitively. They're issues that data can improve, analytics can improve, but analytics aren't a panacea, you can still make these mistakes even with data. Let's drop into them. The first is regression to the mean. And I want to start that with a very Simple Model of performance where you can think of performance in terms of Real Tendency + Luck. And we've been talking about this a little bit, we can formalize it and don't get too put off by the baby math here, but in formal terms you can think of performance y as a function of x true ability and e some error, some randomly distributed error around 0. Now, what does that mean for when we sample on extreme performance? What underlies extreme success and failure? If that's the model of the world, and everything we've been saying so far says it is, that there's some noise in these performance measures, what does it mean when we sample on extreme performance? Well, it means that extreme success suggests that the person might in fact have superior ability or tried very hard, but also that they got lucky, that error was positive. And conversely, extreme failure perhaps means inferior ability or that they did not try very hard but also negative error or that they got unlucky. We can be sure that as we sample very extremely on performance measure, a noisy performance measure, and they're all noisy, we can be sure that when we go to the extremes, we get extreme error as well. What are the consequences of that? There's one very important consequence, and that is in subsequent periods, error won't be negative again, it will regress to the mean. You'd expect it to be zero. Error is, by definition, zero. And if you got very positive error in one period, you would expect less error in the following period. This is a notion called regression to the mean, and it's one of the most important notions in performance evaluation. An example. There was a study a few years ago of mutual fund performance, in the 1990s. The study divided the 1990s into two halves. 1990-94, and then 1995-1999. And they looked at the top 10 performing funds from the first half of the decade. And here, I'll show them to you. We anonymized them. This is just supposed funds A through J, and their performance in the early 1990s. There were 283 funds in this study. These were only the top 10 performing funds. Then they did two things. They go and ask how do these funds perform in subsequent years? And they did an interesting thing in between. They ask people, what do they predict happened in the next few years? What do they think performance would be realized in the second half of the decade? Here are the predictions, the estimations from the people that they ask. They thought the top performing, they didn't think the top performing firm A would again be the top performing firm, but they thought maybe tenth and so on down the list. E which is the fifth performing firm they thought, well, maybe 44th and so you can see that they didn't expected the firms to be as good, but they expected some regression to the mean. Then they looked to what actually happened. What actually happened? It ranged from 129th, 21st, 54th. The interesting this is that on average, the firms performed. Their rank was 142.5. What is the significance of 142.5? It's half of the total number of firms in the study. In other words, the average performance of the top 10 firms, in the second period, the second half of the 90s, was perfectly average for this sample. They've regressed entirely. The top 10 mutual funds in the top half the 90s and of early 90s regressed entirely to the mean in the second half of the 90s. If that's the case ,what does that say about how much skill versus luck was involved with how those firms did in the first half of the 90s? If they regress all the way to the mean in the second period, it suggests that there was no skill. That the differences that we saw, and there are huge consequences to those differences because we know that new funds flow to successful funds, were in fact, entirely based on luck. There are many other examples. Danny Conoman, Nobel Prize winner Danny Conoman gives a famous example of being an officer in the Israeli Air Force. He was studying the officer in the Israeli Air Force, this was early in Conoman's career. And the officer told him, punishment is more effective than praise. Whenever I punish a pilot after a really poor flight, I see better performance the next time. Whenever I praise a pilot after an excellent flight, I see worse performance the next time. Therefore, it must be that punishment is more effective than praise. What's the more parsimonious explanation? The more parsimonious explanation is that there's a little chance involved with whether a pilot has a good performance, a good flight or a bad flight. And after a good flight, if there's some chance involved there, you would expect that the following flight wouldn't be as good and conversely. After a bad flight, if there's some chance involved, you would expect, you would predict that the next flight would, on average, be better. This is exactly why we have to be so careful about regression to the mean. We have the wrong model of the world if we don't appreciate regression to the mean. We walk around like the Israeli Air Force officer who believed that it's all about praise and punishment as opposed to merely statistical regression to the mean. There's another example which comes from, we're not going to pick on Israeli Air Force officers. One comes from Tom Peters one of the original business book Peters and Waterman were McKenzie consultants no less. And they did a study and it begins as an internal study and they've eventually published it as a hugely best selling book, on what determines excellence in companies. They selected 43 high performing firms and tried to learn what they could about business practices from these top 43 firms. But, subsequently a few folks evaluated the performance of those 43 firms, and what do they find? 5 years later, there were still some Excellent Companies. And there were some that were solid, but not exactly the top of their industries. And then there were quite a few in weakened positions, and there were even some from these supposed 43 excellent companies who were fully troubled. Now, this is exactly what you'd expect from regression to the mean. And that suggests that, that sample, that Peters and Waterman had grabbed, as supposedly excellent firms, they grabbed them. Perhaps they were on average, a little bit better. But they hadn't necessarily been lucky. To make it into that sample, to be called the most successful of 43 firms in the world, essentially, they were necessarily lucky, and in subsequent periods, they're not going to have luck break their direction. This is something that you'll see anytime you sample based on extreme values that if you sample on one attribute, any other attribute that's not perfectly related will tend to be closer to the mean value. We've been talking about performance at points in time. If you sample on extreme performance at one time period, the subsequent time period won't be as extreme whether you sample extremely good or extremely bad. But it can also be attributes within an individual or within an organization. If you sample, say, a person's running speed, and then look at what their language ability is, these things are imperfectly related. If you only looked at the fastest runners, how would you expect them to perform on some language ability test? They wouldn't be, they wouldn't be a sign. The fastest runners will almost find definition not necessarily be the people with the best language ability. But that's not because their some inverse relationship between running and language ability is that this two traits or simply imperfectly correlated. And so when you sample on the extreme, you have to expect regression to the mean on any other attribute. We could spend a day on regression to the mean. In fact, there aren't many concepts that are more important in understanding the world than regression to the mean. We could spend hours on this. And, I would be very happy if you walked away from this course with only two or three ideas, if this was one of them because it's going to help your reasoning about the world. Why is this so hard? Why is this such a hard concept to stay, to live? Well, there are a few things to get in the way. Among others, we have this outcome bias. I mentioned it with Hershey and Bear, and referenced Hershey and Bear, and they're the ones that came up with the study originally. We tend to believe that good things happen to people who worked hard, bad things happened to people that work badly and we draw too strong an inference based on this. We tend to judge decisions and people by outcomes and not by process. This is a real problem and it gets in the way of our understanding this regression to the mean, framework for the world. Two others, one is hindsight bias. Once we've seen something occur we have a hard time appreciating that we didn't anticipate it occurring. We in fact often misbelieve that we anticipated that's exactly what would happen. We show hindsight bias. And again, if that's the way we reason about what happens, then we're not going to appreciate that what happens next just could possible be just regression to the mean. And then finally narrative seeking, we want to make sense of the world, we want to connect the dots. We tend to believe things better if we can tell a causal story in between what took place at time one and what took place at time two. And if we can tell a causal story, then we actually have a great confidence in our ability to predict what happens next. We seek these stories as opposed to what I've been telling you which is this dry statistical reason for why things happen. We seek stories. And that again gets in the way of our understanding the statistical processes that actually drive what's going on. In short, we make sense of the past. We are sense making animals, and we make sense of the past, and there's not a lot of sense to be had for merely regression to the mean. But it's going to it's going to get in the way of predicting what happens next. We try to find stories that connects all the dots, and we by doing that get a chance to smaller role and those stories. There was an Internet meme that captures these well a year or two ago where, if this is knowledge distributed in your experience in the past this is knowledge you might have and with that knowledge perhaps you can add some experience and start connecting the dots, drawing simple eyes, create something from that knowledge. That's good. That's what we want experience to do. Bu then sometimes we're inclined to do this, which is get a little too creative and over fit those lines. We turn what should be a pretty straight grid, pretty parsimonious connections into something that is unlikely to replicate in the future. It might be very satisfying, might be a very satisfying interpretation of the past, but it's over fit, and an over fit interpretation of the past is going to make very bad predictions about the future.