What researchers mean by... statistical significance

It's easy for non-scientists to misunderstand the term significant when they come across it in an article. In everyday English, the word means "important." But when researchers say the findings of a study were "statistically significant," they do not necessarily mean the findings are important.

Statistical significance refers to whether any differences observed between groups being studied are "real" or whether they are simply due to chance. These can be groups of workers who took part in a workplace health and safety intervention or groups of patients participating in a clinical trial.

Let's consider a study evaluating a new weight loss drug. Group A received the drug and lost an average of four kilograms (kg) in seven weeks. Group B didn't receive the drug but still lost an average of one kg over the same period. Did the drug produce this three-kg difference in weight loss? Or could it be that Group A lost more weight simply by chance?

Statistical testing starts off by assuming something impossible: that the two groups of people were exactly alike from the start. This means the average starting weight in each group was the same, and so were the proportions of lighter and heavier people.

Mathematical procedures are then used to examine differences in outcomes (weight loss) between the groups. The goal is to determine how likely it is that the observed difference — in this case, the three-kg difference in average weight loss — might have occurred by chance alone.

Now here's where it gets complicated. Scientists use the term "p" to describe the probability of observing such a large difference purely by chance in two groups of exactly-the-same people. In scientific studies, this is known as the "p-value."

If it is unlikely enough that the difference in outcomes occurred by chance alone, the difference is pronounced "statistically significant."

Mathematical probabilities like p-values range from 0 (no chance) to 1 (absolute certainty). So 0.5 means a 50 per cent chance and 0.05 means a 5 per cent chance.

In most sciences, results yielding a p-value of .05 are considered on the borderline of statistical significance. If the p-value is under .01, results are considered statistically significant and if it's below .005 they are considered highly statistically significant.

But how does this help us understand the meaning of statistical significance in a particular study? Let's go back to our weight loss study. If the results yield a p-value of .05, here is what the scientists are saying: "Assuming the two groups of people being compared were exactly the same from the start, there's a very good chance — 95 per cent — that the three-kg difference in weight loss would NOT be observed if the weight loss drug had no benefit whatsoever." From this finding, scientists would infer that the weight loss drug is indeed effective.

If you notice the p-value of a finding is .01 but prefer it expressed differently, just subtract the p-value from the number 1 (1 minus .01 equals .99). Thus a p-value of .01 means there is an excellent chance — 99 per cent — that the difference in outcomes would NOT be observed it the intervention had no benefit whatsoever.

Not all statistical testing is used to determine the effectiveness of interventions. Studies that seek associations — for example, whether new employees are more vulnerable to injury than experienced workers — also rely on mathematical testing to determine if an observation meets the standard for statistical significance.

Source: At Work, Issue 40, Spring 2005: Institute for Work & Health, Toronto