0:00

[MUSIC]

Â Welcome back.

Â I hope you were able to gain an understanding of the difference between

Â exploratory and explanatory analysis from my colleague Sook.

Â Moving forward, most of what we will be discussing is exploratory analysis,

Â since we don't necessarily have an idea of the questions we are trying to answer.

Â In this lesson, we're going to exam a classic case study.

Â It's called the Anscombe Quartet, and it revolutionized data analysis.

Â It was first posited in the early 1970s, he stated that you can't

Â just use summary statistics to understand the data, you have to visualize it.

Â In a world where we have data sets that could be in the trillions of records,

Â Anscombe's argument is even more relevant today.

Â It's not to say that summary statistics aren't important.

Â They are absolutely essential, but you must also visualize it.

Â So we're going to do it with tableau.

Â So this is how it's going to look when it's all done.

Â Looks exciting, right?

Â Well, let's get started.

Â 1:12

These data are available to you in the resources.

Â But, let me first introduce it.

Â This are the data that Francis Anscombe used.

Â It's a very simple data set at first glance.

Â 1:28

All of the x values are identical,

Â x1, x2, x3 and x4.

Â All of the y values have the changes depending

Â on whether it's in y1, y2,y3 or y4.

Â Now the crucial thing is that the summary statistics, the average, the variance,

Â the correlation and the linear regression slope are all identical.

Â So the mean of x1, x2, x3 and x4 are all 9.

Â The means of y1, y2, y3, and y4 are all 7.5.

Â And similarly, the variance of x are all identical, and

Â the variances of ys are all identical.

Â The correlations of each of those x1 and y1, x2,

Â y2, x3, y3, x4, and y4 are all identical,

Â which means it is exactly the same regression line for each of the equations.

Â Now, we want to do this in Tableau.

Â This is a Tableau class, so

Â it's very important we do as much as we can in Tableau.

Â And it's going to be much easier to do the visualizations,

Â data setter cleaned up to make it easier for us to use.

Â In this case, we're going to do what's called normalizing the data.

Â What that means is that each row contains only one piece of information.

Â So, in the data set shown here, it is designed to be analyzed using summary

Â statistics in a statistical software package like Stata, SAS, R or SPSS.

Â But in Tabeau, it's not intended to be a statistical software application.

Â It's a visualization package, and thus, normalizing data is

Â essentially done to maximize what you can do with data in Tableau.

Â 3:28

So although I'm going to do the visualization in Tableau,

Â I'm going to do the data preparation in Excel.

Â Because it's actually much more difficult to do it in Tableau and

Â Excel is really made to do this data manipulation.

Â Now to do this, I modified the spreadsheet and you can do this however you want it.

Â You can do it manually, you can do it through cut and paste, but

Â I'm not going to go through this exercise because this is not an Excel course.

Â I made the change to the spreadsheets.

Â So now there's a number going down the side, and there's an x and y column.

Â So instead of having x1, y1, x2, y2, etc., it's the number and then x and y.

Â So it's going down, so again, it's normalized.

Â 4:18

Now this is interesting because there are two additional columns

Â that really require explanation.

Â One is called Column and one is called a Row.

Â And yes, there's a column that I'm naming Rows.

Â Each row is assigned a value, either first or

Â second depending on where I want the particular chart to go into visualization.

Â And the reason why I have those two columns,

Â is because we want to replicate what Francis Anscombe did back in 1973.

Â To not only prove a point about outliers and

Â the importance of exploratory analysis through visualization, but

Â also to give you a little taste about the cool tricks that you can do in Tableau.

Â To be able to get visualizations that you wouldn't normally be able to do just by

Â doing some of the default choices that are available in Tableau.

Â Tableau is powerful enough to be able to allow these other innovations,

Â through various calculations, to make your visualizations look really cool.

Â 5:18

Okay so, the data are ready for Tableau.

Â In the last course,

Â you spent some time walking through how to import the data from Excel.

Â It's very important that it's just really what you're going to be doing

Â most of the time.

Â But this data set, isn't that big, it's actually considered tiny.

Â And so, this is an opportunity to show you another trick.

Â 5:40

You need to copy the data from Excel, like I'm doing here.

Â Make sure you grab every row and column, don't forget anything,

Â just double check, and then do the copy in Excel.

Â I usually do Ctrl+V, if you have a PC, but you can do it from the drop downs as well.

Â So just copy that information down.

Â 6:02

If you already haven't done so, please open Tableau.

Â Click on the Data menu, then click on Paste Data.

Â It will churn, but not for very long, and then voila, your data is now in Tableau.

Â It's really cool, because you don't have to do the importing,

Â you just paste it in there.

Â And it will work for medium size data, and this is perfect for our benefit here.

Â So definitely take advantage of that.

Â If you just need it just to get that visualization done, just paste it in.

Â You don't even have to get it imported it from an Excel document.

Â However, there are some changes that you're going to make to the Excel file,

Â it would be good to do it through data connections.

Â 6:53

It's nice to have that cross tab, maybe, but that's not what we want, of course.

Â So drag all the fields away.

Â Another way to do it is you would just go up to the drop downs here,

Â and you can just clear the worksheet.

Â 8:15

Now we'll do a little bit of formatting here.

Â So let's change the marks to a circle.

Â Change the color to orange, and enlarge the size of the circle up a bit.

Â We're going to add a trend line here, but

Â I'm going to remove that confidence interval that it automatically puts there.

Â 8:43

These data sets look virtually, not just virtually but

Â actually identical, but obviously they're not.

Â As evidence through visualization and not through summary statistics.

Â The one on the bottom right, for example, fits the same exact

Â 9:02

regression line as the one in the upper left.

Â Yet the data sets are very different.

Â The one in the upper left is sort of a traditional linear correlation between

Â x and y, but it actually has exactly the same correlation as the bottom right.

Â 10:09

I'm assigning a couple of readings on the Anscmbe's Quartet and why it's important.

Â But in the meantime bottom line is this,

Â make sure you do exploratory work on your data.

Â In the next lesson we're going to show more ways about learning your data through

Â exploratory work.

Â So I'll see you there.

Â