Welcome to Module 2 of Framework for Data Collection and Analysis. In the first module, we introduced to you two different ways to classify data. One we called organic and found data. The other one we call design data. Now no matter which of these two you work with we said we need to understand the data generating process. Today we're gonna take a little bit more about why it might make sense to really think about design data. And why it will be hard to do an inference from any type of data, in particular organic and found data, if you don't have a specific research question and hypothesis in mind, and one that you really closely align with what you want to do. So why do we think about design and the research question? Take a look at this little picture. It might be familiar to you, but maybe not. What do you see? Some of you might say well I see a duck. Others might say I see a rabbit. And has exactly that. People can see different things with different objects. And that same thing can be true when you look at data. Without something that guides your observation, it can be quite difficult to make the right inference. So, let's look at this more formally. Here we have three observations, q1, q2, and q3, and you would observe all of them in the sense that you assume when this happens, you observe q1, q2, q3, and now you think okay, then my theory must be true about this relationship between P and Q. The problem is, it's really hard if not impossible to make up explanations from known facts. When you do that, the principle of doing that is called induction. But it has some issues and they are now beg a discussion again as we are swamped with data, this data deluge, which is that there's a tendency to affirm the consequence that, you know, only see what you already expect to see and not necessarily learn what really is in your data. So the two problems that you see that you should have in mind when you are tempted to do an induction is that there could be a number of equivalent models that explain the data that you have and if you only come up with one explanation after the fact, or even if you come with the idea of attentive models, you won't necessarily have the right data to test the different models. This can happen both in designed and organic data, but designed data you can plan differently, and I'll show you an example in a minute how that works. And then we have an infinite number of observation that we presumably would need to do. So let's look at the first problem here, maybe theory p explains the occurrence of q1 q2, but what if there's another theory p2 that does also explain that theory? Take this picture for example. Here we have in orange the dots in the graph. Those are the observation data points that we have. There are two theories, one indicated by a blue line between X and F. Function of x, that's the relationship that we assume to be true would perfectly explain the data, right? Describe that relationship between f of x. But the sinus curve that you see here, the green line, also would perfectly describe these relationship that is conform with the data points that you observe. So how, if all you have are these orange data points, how can you know which one of the two is the true one? Right? You don't know which theory it is. Had you thought about these relationships and these theories ahead of time, you could have made sure that you collect data where the blue and the green lines do not overlap, where they are different and therefore, having that observation you then could distinguish between these two theories which are the ones actually explain your real world phenomenon. So that's the problem of accrual end models. The second problem is often demonstrated with Hempel's raven, this famous sentence in philosophy of science. All ravens are black. You would observe the first one is black, the second one is black, the third one one is black, and so on all the way down. And it could be that as long as you watch and you observe, that's what you see, we've always seen black raven. But that doesn't mean that all ravens are black. Right? It could be that once you start looking there's another one coming. Right. And you know the raven might be gone by then.