Primary data and secondary data

About the “What researchers mean by...” series

This research term explanation first appeared in a regular column called “What researchers mean by…” that ran in the Institute for Work & Health’s newsletter At Work for over 10 years (2005-2017). The column covered over 35 common research terms used in the health and social sciences. The complete collection of defined terms is available online or in a guide that can be downloaded from the website.

Published: November 2015

What does each and every research project need to get results? Data – or information – to help answer questions, understand a specific issue or test a hypothesis.

Researchers in the health and social sciences can obtain their data by getting it directly from the subjects they’re interested in. This data they collect is called primary data. Another type of data that may help researchers is the data that has already been gathered by someone else. This is called secondary data.

What are the advantages of using these two types of data? Which tends to take longer to process and which is more expensive? This column will help to explain the differences between primary and secondary data.

Primary data

An advantage of using primary data is that researchers are collecting information for the specific purposes of their study. In essence, the questions the researchers ask are tailored to elicit the data that will help them with their study. Researchers collect the data themselves, using surveys, interviews and direct observations.

In the field of workplace health research, for example, direct observations may involve a researcher watching people at work. The researcher could count and code the number of times she sees practices or behaviours relevant to her interest; e.g. instances of improper lifting posture or the number of hostile or disrespectful interactions workers engage in with clients and customers over a period of time.

To take another example, let’s say a research team wants to find out about workers’ experiences in return to work after a work-related injury. Part of the research may involve interviewing workers by telephone about how long they were off work and about their experiences with the return-to-work process. The workers’ answers–considered primary data–will provide the researchers with specific information about the return-to-work process; e.g. they may learn about the frequency of work accommodation offers, and the reasons some workers refused such offers.

Secondary data

There are several types of secondary data. They can include information from the national population census and other government information collected by Statistics Canada. One type of secondary data that’s used increasingly is administrative data. This term refers to data that is collected routinely as part of the day-to-day operations of an organization, institution or agency. There are any number of examples: motor vehicle registrations, hospital intake and discharge records, workers’ compensation claims records, and more.

Compared to primary data, secondary data tends to be readily available and inexpensive to obtain. In addition, administrative data tends to have large samples, because the data collection is comprehensive and routine. What’s more, administrative data (and many types of secondary data) are collected over a long period. That allows researchers to detect change over time.

Going back to the return-to-work study mentioned above, the researchers could also examine secondary data in addition to the information provided by their primary data (i.e. survey results). They could look at workers’ compensation lost-time claims data to determine the amount of time workers were receiving wage replacement benefits. With a combination of these two data sources, the researchers may be able to determine which factors predict a shorter work absence among injured workers. This information could then help improve return to work for other injured workers.

The type of data researchers choose can depend on many things including the research question, their budget, their skills and available resources. Based on these and other factors, they may choose to use primary data, secondary data–or both.

Source: At Work, Issue 82, Fall 2015: Institute for Work & Health, Toronto [This column updates a previous column describing the same term, originally published in 2008.]