Science is not just for scientists. The methods of thought that underlie science are useful in all sorts of everyday contexts. Most obviously, everyone needs to be able to think like a scientist in order to interpret scientific results—you know, those newspaper headlines like “PAPER CLIP USE MAY LOWER IQ IN PREGNANT WOMEN!!” In that spirit, I’m going to write about some key concepts for thinking like a scientist. Today: sample size.
Pop quiz! You read this (totally made up) report: “Two groups of ten age- and health-matched men were monitored for heart disease. One group was given pet ferrets, while the other was not. The ferret-owning men were 8% less likely to develop heart disease over a five year period.” So: is it time to run out and get a ferret for the sake of your heart health?
Before answering, let’s back up to a more familiar scenario. If a friend tells you that she flipped a coin and it came up heads 80% of the time, your response should not be, “Whoa, unfair coin!” but “How many times did you flip it?” If the answer is “Five times,” then clearly it is highly likely that her results were due to chance, not to some coin weirdness. If the answer is “100 times,” okay, you might be convinced.
This might seem intuitive, but it’s a big issue in science. It’s often very hard to get large sample sizes. I banded 85 juncos this summer, and I’ll need many more before I can say anything confidently about them—especially since you need very large sample sizes in natural situations where things like weather, parasites, and the animals’s past experiences can add variation into the system that you may not be able to account for.
Controlled lab studies can get away with smaller sample sizes because they don’t have those extra sources of variation, but they have other sample size issues: every additional animal has to be housed and fed, and in situations where the animal has to be “sacrificed” (read: killed) at the end, every additional animal is an ethical problem as well. In a study of, say, brain tumors in mice, it’s clearly unethical to involve 200 mice if 50 would do—but it’s also unethical to guess too low, use 20 mice instead of 50, because if your sample size is so small that your results are meaningless, those 20 mice have died for nothing.
The sample size is especially challenging because in practice, you never know how large a sample size you’ll need until after you’ve done the study, because it all depends on how strong and how variable the effect is. If juncos that are 1g smaller in body mass have 50%-plus-or-minus-10% fewer eggs, okay, I can probably find that out by studying not too many juncos; but if they have 10%-plus-or-minus-5% fewer eggs, I’m going to need to study hundreds of nests to find that out. To decide what sample sizes we need, we do statistical tests called “power analyses,” and guess at the size and variability of the effect we want to look at.
And that’s what you should do when considering other people’s scientific work. Not power analyses—I mean, hey, if you want to, great—but essentially, guess at how large a sample size you think they might need to be able to overwhelm the effect of chance on their results.
So for our hypothetical ferret study: we have sample sizes of 10 in each of the two groups in the study, and they found an 8% effect. Eight percent is pretty small, and a sample size of 10 is very small. I would not believe this study. Nor am I inclined to believe this study, which had a similar sample size, and got some press and may have scared some people who didn’t know to look for sample sizes.
Update: the study I linked to has now been officially retracted by the publisher, “bowing to scientists’ near-universal scorn” (!) over its small sample size. Sample size matters!