In the preceding lectures on data representation, we looked at some fairly standardized data representations that you might see in mass media or that you can even develop yourself using spreadsheet software. In this lecture, we'll look at some more unusual and super cool data representations. We'll start with John Snow, a London physician in the early 1800s. He actually discovered the source of an 1854 cholera outbreak in London. Let's take a look at how he did that. This is a map of the streets in London in 1854, and he plot it with these bars that you see here. Here, he would plot the actual occurrences of cholera. It turns out that this red circled pump, the Broad Street pump, was the source of the cholera outbreak. He discovered that by plotting the points that represented cholera occurrences and figuring out that the pump was the cause of the outbreak. So John Snow used a nontraditional data representation technique to discover what the cause of the cholera outbreak was. This slide is from a domain that's near and dear to my heart. This is a heatmap from a game. This was a posting in the answers.unity3d.com forums. So this was how do you use the Unity 3D game engine to build heatmaps. You might be wondering what a heatmap is. This particular example of a heatmap demonstrates time that a player spends in a particular area. So if you see the hot areas, the magenta areas, those are places that players spent extra time either because they found them particularly interesting or particularly challenging or because there were lots of rewards they could reap there and so on. So it's actually a way to plot player activity in your game so that you can try to optimize that, either spread it out more or actually bring people to reward areas more and so on. Of course, the underlying data is information about where the player is at each moment that they're playing, and then you plot it out. Of course, heatmaps aren't just for player interest in games. We also can use heatmaps to show occurrences of deaths in particular areas. If we have some game where death occurs, then if we find that there's a certain area that every player or many, many players are dying in, we can go look at that area and see if perhaps the geography is making it harder for players to succeed there or were spawning too many hoards there to go get them or whatever. So this is another really interesting way to represent data in a useful form for someone who's actually a game developer. Next, we'll talk about Edward Tufte. Edward Tufte has written a number of fascinating books about how we can represent data so we can gain better understanding of what the data is telling us. This is his website right here, and we're going to go there in a moment. What we're going to look at is Minard Map of Napoleon's campaign in Russia in 1812. This map was drawn by a guy named Minard. It was originally in French. So the English translation is provided. You can buy it from Edward Tufte's website, which I showed you on this like. So basically, here's how this works. This tan area right here, the width of this bar represents the number of men that Napoleon started out with on his campaign. As you can see, as he marched toward Moscow, the width of that line got smaller and smaller. So he lost a lot of men on the way to Moscow. Then coming back from Moscow, so black means the return trip, coming back from Moscow, this line gets smaller and smaller. This was a part of his army that rejoined, and smaller and smaller. So Holy Smokes. Look at this. So the width of this is how many he came back with, and the width of this is how many he left with. It was a devastating campaign, where he lost so many men. This is a really interesting graphical representation of that process. Next, we'll talk about Mike Bostok's work. Mike Bostok has some great stuff on GitHub, and he also has some great stuff at this URL shown here. We'll go look at Stop-and-Frisk, the Spread of Drought, and Bubble Maps. This first one shows stop-and-frisk activity in New York. So stop-and-frisk means the police officers stop somebody for some reason and frisk them for weapons or illegal drugs or whatever. In the first half of 2012, this is a map of part of New York. In the first half of 2012, there were 337,000 stops, and each of them are plotted as a point. Then over here on the right, we see the last half of 2013, where there were 33,699 stops. So that's an order of magnitude difference divide by 10. You will see that the parts that were darkest red here still are darkest red here. So it's not like different neighborhoods suddenly became more or less subject to stopping and frisking, but we can see that the overall counts for those stop-and-frisk actions have dropped by an order of magnitude over the course of this span of time, about a year and a half. This is actually a dynamic data representation. You can see on this upper right-hand section that it is moving forward through time from December to April. Within the map itself, the map changes to shore. As we see more and more red and orange and yellow, that is the spread of drought throughout the areas in the US. So this is an interesting representation. It looks something like a heatmap in that the concentration of drought is color-coded. But it's also really interesting that it dynamically changes over time. So we can see what's happening. We can see a trend in time in this particular representation. The last one we'll look at from Mike Bostok is this bubble map. This bubble map, the size of each bubble represents the population of the county that that bubble is on. As you can see from this map, if we look over here in Colorado, we can see this is El Paso County. This is where Colorado Springs is. So there's a concentration of Denver and Colorado Springs and Fort Collins up here and Pueblo down here. So there's a concentration of population in Colorado, around. This is actually just the eastern side of the Rocky Mountains. We can also see, of course, Southern California, Los Angeles County. You might have heard of Los Angeles. There's a lot of people there. So these bubbles across Florida and the northeastern corridor up there. So we can see these areas across the United States and get a sense for which ones are heavily populated and which ones, like here in Nebraska, are not really that populated at all. Finally, we'll look at Nathan Yau's work, which is available at flowingdata.com. We'll look at an interactive salary exploration tool, as well as the two visualizations that I've listed here. This first one is an interesting interactive data exploration widget, if you will. So you can pick your annual compensation right now. Let's say you make $50,000. You can see the number is displayed here. So you can actually see in the past, how many people made $50,000. You'll see that it was pretty rare. Five percent in 1950, and then it got up to about 28 percent in 1970, and it stayed around 28 percent for about 20 years before it bumped higher for 2000 and then a little less for 2010. So there's this huge ramp up from 1950-1970 in people who made that amount of money, and then it straightened out for awhile and then grew fairly gradually. You can obviously see the more money you make, I do not make $350,000, just so you know. But as you can see here, you can crank along, and it's the top one percent all the way across the board here. So not many people across time have made more than that much money. This is interesting. You can actually explore the data based on your own personal circumstances, and that actually can be pretty interesting to people. There you get to see stuff based on your life. Another interesting representation is the power sources in each state. So in the upper left-hand corner, it talks about the United States from 2004-2014. So we went from getting 51 percent of our power from coal to 40 percent. Natural gas went up. Nuclear went down a little bit. Then we had some other things here. You can actually see the change in power sources for each state. You can see what power sources each state uses. So for example, if you look at Alaska, it went from six percent coal to seven percent coal. So Alaska tends not to use coal for its power at all, although we do see that it dropped from 57 to 51 percent in natural gas and hydropower went up a little, and petroleum went up a little and renewables went up a lot, from zero to three percent. You can see a place like Delaware had a massive drop in coal and a massive increase in natural gas. So these interesting representations, which look like line graphs that plot a bunch of different data. But the way that it's broken up and the way it is segregated by state and so on lead to some really interesting observations about the underlying data. The last one we'll look at is subway complaints by subway stations. This is in Madrid. You can see that some subway stations have a lots of complaints. The number of complaints is the size of the circle. Then there's color-coding based on the kind of complaint over here based on the key. So this is actually an interesting representation of the data about both volume. So this station right here has very few complaints compared to the station which has more. So we get information about the volume of complaints, and we also get information about the type of complaints at each of those stations. To recap, in this lecture, we went beyond the traditional ways to represent data and looked at some super cool representations.