Let's now talk a little bit in this video about analyzing pitchers. So when you think that shouldn't be any harder than analyzing the algorithms or analyzing. But it's a lot harder as we'll see in a couple of minutes. So when I was growing up, I thought what measures the good pitcher. It was ERA for Earned Run Average. Which is how many runs you give up for nine innings as a pitcher after you adjust for, take away runs that were caused by errors. Like at two outs, there's a man on second and the shortstop makes an error. And then the runner on second scores because of that error and then maybe some home runs are hit, you shouldn't penalize the pitcher for that. Okay, well turns out ERA is really hard to predict the next year's ERA from the past year's ERA. You basically can only explain around 12% of the variation the next year's ERA from last year's ERA. Now, why should that be? Well, there's a couple things going on, your ERA average depends on the fielders behind you. If you've got great fielders they'll make these great plays and you'll get credit for them and a low ERA, because the shortstop dove in the hole and threw the guy out second and maybe post a double play. That might have saved you a run or two that game, and it wasn't because you were a good pitcher. It's because you had good fielders. It's also maybe great relief pitchers. We may have Mariano Rivera coming in, you left the bases loaded with nobody out. And then Mariano Rivera struck out the side, so you don't get any earned runs charged against you. Not because you did a good job, it's because Rivera did a good job. And then it's the park factor that also matters. And so it's really difficult to predict ERA. But the big breakthrough in this was by a guy named Burros McCracken and this is described in the Moneyball book. I'm not sure of the author's name. But it was in 1999, I think, Burros McCracken stuns the baseball world, because what he looked at is something called BABIP for pitchers, Batting Average On Balls In Play. What percentage of balls in play will lead to a base hit? And it turns out this is very inconsistent or highly variable from year to year. because there's just a lot of luck, you know, where the ball goes. You hit ground balls, it's sort of luck whether it goes right to the fielder or it goes in the hole between shortstop and third or second base and shortstop. And you really just don't know about that. Okay, so what we're looking at is Greg Maddox, great all time pitcher, one of my favorites for the Cubs and the Braves. I mean he could just, had perfect location of the ball I guess was his secret. But if you look in 1998 his batting average on balls in play on average is about 30% or 300. 0.262, which is really good, way below average. He had a great ERA, 2.22. The next year's batting average on balls and play was 324, which is 60 points higher, which is really much higher. And it's probably not because he pitched any different, it's just where the ball goes. ERA's, he did 357. Did he pitch much worst 1999, probably not really. It's just because the balls just go where they go. They can go to the fielders or go not and there's a lot of luck involved with that. So that explains why, in a large part, you have trouble predicting a pitcher's future ERA from his past ERA. So, sabermetrics people have come up with what's called defense independent component ERA, or DIPS, Defense Independent Pitching Statistic to predict ERA from the things that a pitcher can control. What can a pitcher control? Home runs, walks, hit by pitcher and strikeouts. So here's a common formula that's used to predict what a pitcher's really array is and now they have these much more complicated, they adjust for perks and stuff like that. And we just don't have time to go into all these sort of refinements. We want to just hit the basic ideas. But if you take 3 plus 13 times home runs plus 3 times walks plus hit by pitcher, minus 2 times strike outs and divide by any hits, you can get a good approximation to what a guy's ERA is. So let's look at the great Clayton Kershaw we displayed in not the post season, the regular season 2014 point Kershaw. So I've got his stats here, pitched this many innings, gave up only nine home runs. 33 walks plus hit by pitchers and 239 strike outs. And so his actual ERA was 1.77. Let's use this formula to predict it, so you take 13 times home runs plus 3 times plus once hit by pitcher minus 2 times strike outs and divide by innings pitched and add 3. Okay, so I take that 3. And then I would sum a product, if you remember that formula. Multiply these weights. Times the walks plus hit by pitcher and innings pitched. Sorry, walks, hit by pitcher, and strikeouts. Divide by the innings pitched. And I get a predicted array of 1.68, which is very, very close. Okay, and it turns our if you use DIPS or DICE or other measures of defense of independent pitching statistics to predict a pitcher's ERA next year you get a much better raw score or more predictive values. So this is really a very interesting situation. To predict the value of something next year, or run average, you don't use past data. You use something else. And to my knowledge, that's pretty rare in business that that happens. That you want to use something other than, let's say, past GNP to predict future GNP. Okay. That happens I guess once in a while in business, but I haven't seen it that much. But it's very interesting. And then just one final comment, a new frontier, which really a lot of research is being done, but I don't know much about it, is then you predict the likelihood a pitcher will get hurt. It's a lot of pitchers are getting hurt. because when they're going to give pitchers really long term contracts as they gave Hershel or they gave Justin Verlander, and you really hope that the pitcher is not going to get hurt because basically then that money is really sort of down the drain. And so it's really worth millions of dollars to Major League Baseball teams if you have a better method to predict the likelihood a pitcher will get injured. Now they carefully monitor pitch count. And I guess the best tragic example is one of my favorite players, Cary Williams, struck out 20 people in one game, I think it was almost 20 years ago. And basically, he threw a lot of pitches. And now they monitor pitch count because they think if you throw a lot of pitches, you're more likely to end up with an injured or sore arm. But again, this is beyond the scope of where we can go, but if you can figure out a way to predict somehow the likelihood a pitcher will get injured that's better than anybody else's approach. That's worth a lot of money to a Major League team. Okay, so we've talked about how you can talk about defense and defended pitching statistics to predict how well, evaluate a pitcher's true performance during a season based on what he can control. Okay now in the next video we're going to talk about win probability added, which is sort of a nice way you can put pitchers and hitters on an equal playing field to see how much they help the team. In other words is a great reliever like Mariano Rivera more important than a good starting pitcher or really solid hitter whose an everyday player. And I think when profitability added is a really good measure of that, so we'll discuss that in the next video.