Sample size and power

About the “What researchers mean by...” series

This research term explanation first appeared in a regular column called “What researchers mean by…” that ran in the Institute for Work & Health’s newsletter At Work for over 10 years (2005-2017). The column covered over 35 common research terms used in the health and social sciences. The complete collection of defined terms is available online or in a guide that can be downloaded from the website.

Published: August 2008

Few of us read research reports with an eye to critiquing the methodology. The results are the main attraction, the reason for reading in the first place. But researchers spend much of their time planning how their studies will be carried out. Shouldn’t we pay more attention? As any decent researcher will tell you, a study’s results are only as good as its design. Sample size and power are key elements of study design.

What is sample size and why is it important?

Sample size refers to the number of participants or observations included in a study. This number is usually represented by n. The size of a sample influences two statistical properties: 1) the precision of our estimates and 2) the power of the study to draw conclusions.

To use an example, we might choose to compare the performance of marathon runners who eat oatmeal for breakfast to the performance of those who do not. Since it would be impossible to track the dietary habits of every marathon runner in the world, we have little choice but to focus on a segment of that larger population. This might mean randomly selecting only 100 runners for our study. The sample size, or n, in this scenario is 100.

The study’s findings could describe the population of all runners based on the information obtained from the sample of 100 runners. No matter how careful we are about choosing our 100 runners, there will still be some margin of error in the study results. This is because we haven’t talked to everyone in our population of interest. We can’t be absolutely precise about how eating oatmeal affects running performance because it would be impossible to look at every instance in which these two activities coincide. This measure of error is known as sampling error. It influences the precision of our description of the population of all runners.

Sampling error, though unavoidable, can be eased by sample size. Larger samples tend to be associated with a smaller margin of error. This makes sense. To get an accurate picture of the effects of eating oatmeal on running performance, we need plenty of examples to look at and compare. However, there is a point at which increasing sample size no longer impacts the sampling error. This phenomenon is known as the law of diminishing returns.

What about power?

Clearly, determining the right sample size is crucial for strong experimental design. But what about power?

Power refers to the probability of finding a statistically significant result (read the column on statistical significance). In our study of marathon runners, power is the probability of finding a difference in running performance that is related to eating oatmeal.

We calculate power by specifying two alternative scenarios. The first, called the null hypothesis, is one that says there’s nothing going on in the population of interest. In our study of marathoners, the null hypothesis might say that eating oatmeal has no effect on performance.

The second is the alternative hypothesis. This is the often anticipated outcome of the study. In our example, it might be that eating oatmeal results in consistently better performance.

The power equation uses these two alternatives so that the study can find the answer to the research question. As researchers, we want to know if our study of marathoners can detect the difference between oatmeal having no impact on running performance (the null hypothesis) and oatmeal having a considerable impact on running performance (the alternative hypothesis).

Often researchers will begin a study by asking what sample size is necessary to produce a desirable power. This process is known as a priori power analysis. It shows nicely how sample size and power are inter-related. A larger sample size gives more power.

While the particulars of calculating sample size and power are best left to the experts, even the most mathematically-challenged of us can benefit from understanding a little bit about study design. The next time you read a research report, take a look at the methodology. You never know. It just might change the way you read the results.

Source: At Work, Issue 53, Summer 2008: Institute for Work & Health, Toronto