The goal of sampling is to obtain the best possible estimate of a population value within the limits of our budget and our time. Suppose we've decided on a sampling method for our study, preferably a probability sampling method, if this is at all possible. The question now remains, how many elements we need to sample in order to get an accurate estimate of the population value? An easy answer would be as large a sample as we can afford. Because as sample size increases, the margin of error will decrease. Accidental over or under-representation of certain elements will be less extreme and will become less likely. In other words, a bigger sample is always better in terms of accuracy, but this doesn't mean we should all collect samples consisting of tens of thousands of elements. This is because as the sample size grows, the decrease in the margin of error becomes smaller and smaller. At a certain point, the cost of collecting more elements outweighs the decrease in the margin of error. Say we want to estimate the proportion of votes for candidate A in an upcoming election. Suppose we have a sample of 500 eligible voters. Then the error won't be cut in half if we double the sample to 1000 elements. The decrease in error will be much, much smaller. Note that it's the absolute size of the sample that matters, not the relative size. It doesn't matter if we're estimating election results in Amsterdam, with slightly more than half a million eligible voters, or national elections with more than 13 million voters. As long as the samples are both randomly selected, the margin of error will be the same. A lot of the things being equal. This seems very counter-intuitive, but it's true nonetheless. Of course, there are other factors you can consider when deciding on sample size. The variability of the population is an important factor. Heterogeneity or strong variation in the population on the property of interest. Results in a larger margin of error. All other things being equal. It values in the population vary widely than a sample is more likely to accidentally over or underestimate the true population value. Is the population is more homogeneous or similar, meaning, it takes on narrow limited set of values. Well then, the sample value will automatically lie closer to the population value. If a population is more homogeneous, we can sample more efficiently. This means, all other things being equal, that we can achieve a smaller margin of error with the same sample size, or conversely, we can obtain the same margin of error with a smaller sample, more efficient. If a probability sampling method is used we can determine what margin of error we're willing to accept, given a certain confidence level. We can say that we want our sample estimate of lecture results to differ by no more than 5% from the final results in 95% of the cases, if we were to sample repeatedly. We, or rather a computer, can now calculate exactly what sample size we need to obtain this margin of error at this confidence level. This does require that we use random sampling, and that we can estimate the variability in the population, for example, based on previous studies, old census data, or just a best guess if necessary. I'll just mention one other important factor to consider when determining the sample size. It's a good idea to plan ahead and compensate for non-response. Non-response refers to elements in the sample that cannot be contacted, that refuse to participate, fail to complete the study, or provide invalid responses. If the response rate can be estimated based on previous or comparable research, then we can take non-response into account and sample extra elements that will compensate for the expected loss of elements due to non-response.