Remember, we simulate the experiment under the assumption of independence.
Or, in other words, leaving things up to chance.
If the results from the simulations look like the
data, then the difference between the proportions of correct guesses.
Can be said to be due to chance.
If on the other hand, results from the simulation do not
look like the data, the difference between the proportions of correct guesses
in the two groups, we can conclude was not due to
chance, but because people actually know the backs of their hands better.
So this is what our randomization distribution looks like.
The heights of the bars here basically represent what percent of the time, or how
many times within these 10,000 simulations a particular simulated p hat was achieved.
Remember the definition of the p value is the probability of
observed or more extreme outcome, given the null hypothesis is true.
And when we think about the observed, we want to
think about what was the success rate in the
back of the hand group, and what was the
success rate in the palm of the hand group?
And we want to take the
difference between these two, because that's
going to be the corresponding point estimate
that looks like our null hypothesis, but is based on our sample data.
The difference between the two proportions come out
to be roughly 33% so the p value is
calculated as the percentage of simulations that are more
than 33% away from the center of the distribution.
And the center of the distribution is always at zero because
remember we're assuming that the non hypothesis is
true and we're leaving things up to chance.
When we shuffle them into the two decks.
With a p-value of 0.16, 16%, we would fail to
reject the non hypothesis and say that no there isn't
actually convincing evidence based on these data that people are
better at or worse at or least there's some difference
between how they recognize the backs versus the palms of their hands.