0:00

[MUSIC]

Â Hi, in this module, I'm going to talk about the phenomenon of Type I Errors, or

Â false positives.

Â A Type I error occurs when we incorrectly

Â conclude that a No hypothesis is incorrect.

Â That is we incorrectly reject the null hypothesis, or

Â conclude that a relationship or a difference exists, when there is none.

Â This occurs when a test suggest that a result is statistically significant,

Â when in fact it's the product of chance variation in the composition of

Â our sample.

Â Again, the p-value is an estimate of the probability of

Â the difference between the hypothesized parameter and

Â our observed measurement being the result of random chance.

Â But again, a p-value is never completely zero, it's always non-zero.

Â So there's always a chance that any result that we do observed,

Â could be produced by random chance.

Â 1:05

Now, if we're worried about these Type I errors accidentally accepting

Â a result as statistically significant, when in fact, it shouldn't be.

Â Try to increase the sample size, so

Â larger sample sizes tend to change our test statistics.

Â Generally, make them larger so we get generally smaller p-values.

Â We can make the criterion for the hypothesis test more strict.

Â So we can demand that we want a p-value of 0.01 or 0.001,

Â before we accept that a relationship is statistically significant.

Â So there is only a one in a hundred or

Â one in a thousand chance committing a Type I error.

Â And we can try to reduce measurements error, that's a little bit technical.

Â But if we measure our outcome with more precision,

Â with careful protocols and so forth,

Â to remove chance variations that helps reduce the chances of a Type I error.

Â 2:07

Now, I want to talk about the phenomenon of mass significance,

Â which is actually associated with Type I errors.

Â So because of the prospect of Type I errors,

Â in fact the likelihood of Type I errors.

Â Significance tests should not be used in evaluating large numbers of relationships.

Â Sifting through them to find relationships that are worth looking at,

Â because they appear to be statistically significant.

Â Significance tests should be used in evaluating pre-specified hypotheses, not,

Â again, screening and conducting exploratory analysis.

Â So, using p-values to screen for

Â statistically significant relationships among large numbers

Â of variables in exploratory analysis is extremely dangerous.

Â Think about it.

Â If we regress one variable made up entirely of random numbers,

Â on 100 other variables that are all, again, made up entirely of random numbers.

Â On average, if we look at the coefficients for these 100 right-hand side variables,

Â on average, five of them, even though they have nothing to do with the outcome.

Â Five of them will be statistically significant at the 1% level.

Â On average, one of them will be statistically significant at the 1% level.

Â So, again, we want to avoid screening.

Â Because we're guaranteed if we run enough regressions, that we're going to find

Â relationships that are statistically given and we're going to have false positives.

Â So some examples, genetic studies were a great deal about the prospect of

Â false positives, speaks quite often their regressing

Â measures of some phenotype on thousands or tens of thousands of genomes.

Â This would be in Genome-Wide Association Studies.

Â So when they do analysis, they have to set very strict criterion for

Â assessing statistical significance,

Â perhaps demanding a p-value of 0.00001 or 0.00000001.

Â Regressions that include large numbers of interactions between categorical

Â variables will produce dozens or hundreds of estimated coefficients.

Â And again, we are guaranteed that a certain number of them,

Â even if they actually have no relationship to anything,

Â will appear to be statistically significant.

Â 4:24

Another related issue is P-value mining,

Â that if you are looking an association between some y variable,

Â some x variable, and then you have a bunch of control variables.

Â Continually tweaking the model to introduce or remove variables,

Â or somehow change the sample that you're conducting the analysis on.

Â It's a bit like drawing different samples, not quite.

Â But, you run the risk that, eventually, just by luck of the draw.

Â You'll get a sort of model and a sample in which

Â a relationship appears to be statistically significant, but it's actually not.

Â And it's just chance variation, after trying enough different models,

Â enough different definitions of the sample

Â that you seem to get something that appears statistically significant.

Â 5:10

Now, a new issue that we're becoming increasingly aware of is publication bias.

Â Thus far, we've considered examples where mass significance leads to problems for

Â an individual researcher or a team.

Â Somebody sifting through hundreds maybe thousands of relationships or

Â coefficients, and digging out the ones that appear statistically significant.

Â But actually, we have to keep in mind, and we're recognizing this increasingly

Â as an issue, is that we have around the world hundreds of researchers or

Â hundreds of teams working on related topics.

Â But using independent samples.

Â So if you have 100 teams around the world working on the same topic,

Â doing the same analysis.

Â But each doing something on a sample that they've collected themselves.

Â Even if there's actually no relationship between whatever it is

Â that all these teams are looking at, and then,

Â the right-hand side variable that they're testing.

Â Five, on average, of these hundred teams, would yield or

Â come up with results that are significant at the 5% level.

Â Or on average,

Â one of them would come up with a result that is significant at the 1% level.

Â Now a publication bias refers to the fact that, that team or those teams,

Â that had the false positives, they'll be able to go and publish their papers.

Â Because journals are interested in novel findings,

Â that's what drives the field forward.

Â The remaining teams will put the results in a drawer or

Â the filing cabinet or just throw them away.

Â May not be able to publish them, until they have an opportunity to refute

Â the papers published by the teams that have the false positives.

Â So, these teams that have the false positives, they publish papers.

Â The other teams don't publish anything, they move on to something else.

Â Again, until perhaps, there's been some controversy.

Â And then they can pull out their old results, and their negative results, and

Â publish them.

Â So, what can we do about publication bias?

Â Well, we're moving towards, especially for medical or drug trials,

Â situation in which researchers announce their studies before they conduct them.

Â This is in response to a problem we found with pharmaceutical companies that were

Â essentially repeating studies of the effects of drugs on particular diseases.

Â And then, burying the results until they, again, by chance variation,

Â they got a false positive which suggested in effect of some medication on a disease.

Â That became the study that they published.

Â So as long as people have to announce their studies or

Â report them, record them before they conduct them.

Â Then if somebody comes out and says, well, they've got a exciting new result,

Â we can check back to see if they've already run 15 or 20 studies already.

Â That were actually testing the same relationship that didn't find anything.

Â 8:00

We'd like to create repositories for negative results.

Â So that those teams that tried something, it didn't work out, they can't publish it.

Â Journals normally aren't interested in negative results,

Â unless you can refute somebody's controversial positive results.

Â Still we should have some way of making these results online, available, and

Â searchable, so that people don't reinvent the wheel.

Â And we can have some overall assessment, and look at all of the studies

Â of a particular phenomena, to figure out whether these studies that

Â are published are just false positives and the result of publication bias.

Â As a consumer of research results,

Â you want to be wary of studies that have small numbers of subjects.

Â These are much more at risk of Type I error.

Â So you should look for large studies, especially with controlled treatment

Â designs, where your chances of a Type I error are somewhat smaller.

Â And we should encourage the replication of published studies by using

Â independent samples.

Â So, again, I hope that I have sensitized you to some of the issues that

Â arise when we have to think about Type I errors in our research.

Â You may have learned about Type I errors when you took a statistics class.

Â But you probably didn't think about these broader implications for

Â the problem of mass significance or publication bias.

Â So I hope that you're now sufficiently aware of these issues to take account of

Â this, both in your own research, and as you consume other published research.

Â