When using structured data, the basic three-step process that we're going to follow is created a dataset, build a model, and then operationalized that model. As we said before, creating and cleaning the dataset is often the largest part of this three-step process. Then, the cooking analogy I like to use here is choose high-quality ingredients, prepare them all together with your expertise, and serve up a delicious meal. So, let's walkthrough each of these steps in detail. First up, how can we explore and create that high-quality dataset? What data ingredients are we working with? So, now repeat after me, don't assume that datasets all have high-quality or complete data. Keeping with our kitchen analogy, don't think that all the ingredients in your fridge are fresh or even if they are fresh, you may not need all of them for tonight's dinner recipe. Now, let's talk about data completeness. I like to draw the comparison in the dataset that you just received from your colleagues, like a picture you came across in a random photo book. That picture is just like a dataset. Actually it is a dataset, it's pixel color values and they're just numbers. How do you think the computer sees an image of a cat? Just like RGB or red, blue, green values instead of a matrix. Anyway back to picture, if I gave you this and asked you to see in this collection and pixel data, what do you see? Well, you might say I've got a river and there's a boat, maybe some trees. If you have really good vision, there's a flag in there somewhere hidden on the right. How about now? Now, it's super easy, right? That's the Eiffel Tower in France. The window of the image you saw first is highlighted in red, I literally didn't give you the complete picture. Now, it's a trivial exercise with pictures to see when the pieces of the puzzle are missing, or they don't fit right. Why would it be so different or harder with data? Now, the point here is that you should always remain critical of your data sources. Where do they come from? Why were these fields chosen and not those? Am I missing any data? You don't want to draw inference or build a model that draws inference, before understanding the complete picture of your dataset. Now, that you understood the origin story of your dataset as a whole, let's go through a set of five rules that we can test out each of our feature columns against.