In our March 2021 post, Katie Maki and Bryn Harris discussed how scholars whose projects were disrupted by the pandemic can leverage meta-analytic methods to jump start their research programs. In many settings, scholars may experience continued challenges to returning to their past work when school partnerships and processes remain hampered by various effects of the pandemic (e.g., remote schooling, P20 staffing or enrollment challenges, budget challenges, administrator reluctance). With many scholars looking forward to the winding down of a uniquely challenging (and long) academic year, we wanted to take a bird’s eye view and discuss the broader utility of secondary research and provide some resources to help you get started (unless you plan to spend the summer on much needed rest and recuperation which you absolutely should—without guilt).
Secondary research includes a broad range of approaches that use existing qualitative or quantitative data in a systematic study. They may be particularly attractive to scholars whose applied projects were derailed by pandemic-related challenges to recruitment, data collection, or community engagement given that secondary approaches rely on data that are already available for use. Potential secondary data sources include public or private documents (e.g., state or federal policy, case law, education or health records, as well as other research as was the focus in the previous post), media (as in bibliographic methods, textual analysis), and extant quantitative datasets (e.g., re-analysis of prior study data or large-scale quantitative data), to name a few. Notably, there’s no one way to do secondary research or specific problems that can be answered through secondary methods. And secondary research needn’t be a solitary activity—it’s highly amenable to team science and community-engaged scholarship. Our goal here is to give you some resources to learn more and potentially get started if secondary methods would be useful to your program of research.
Although misconceptions about secondary research are commonplace, secondary researchers follow the typical empirical process of generating and refining research questions or hypotheses, identifying data appropriate for addressing those questions or hypotheses, designing a study to address those questions or hypotheses, gathering and preparing data, and analyzing and interpreting data to offer interpretations and recommendations. The data collection process often involves exploring potential data sources and isolating relevant data from the universe of available options, often before finalizing research questions and hypotheses. These processes generally replace the effort typically involved in selecting or developing study implementation, participant recruitment, and data collection, but may be no less time intensive or important and can certainly offer scholarship as rigorous and consequential as any other approach. The following publications provide a helpful introduction to secondary research approaches:
- Jones, C. (2010). Archival data: Advantages and disadvantages for research in psychology. Social and Personality Psychology Compass, 4, 1008-1017. doi:10.1111/j.1751-9004.2010.00317.x
- Heaton, J. (2004). What is secondary analysis? In Reworking qualitative data. Sage.
- Smith, E. (2008). Pitfalls and promises: The use of secondary data analysis in educational research. British Journal of Educational Studies, 56, 323–339. doi:10.1111/j.1467-8527.2008.00405.x
- Chatfield, S. L. (2020). Recommendations for secondary analysis of qualitative data. The Qualitative Report, 25(3), 833-842.
You may not have a specific question in mind and rushing hypothesis development could lead to false-starts, false-positives (especially in large data sets), and wasted resources. It’s okay not to jump straight to hypothesis testing (Scheel et al., 2020). Exploring secondary data may provide a wealth of information, including better problem definitions, more informed questions, a more complete picture of the relationships between variables, and so on. This may be especially useful at the beginning of a research line, but may still yield fruit for more established scholars.
In addition, large-scale quantitative data can be useful in estimating causal effects and can allow for exploration of other research problems and questions not otherwise feasible without massive financial and human resources. Given that early career scholars don’t generally have millions of research dollars and a team of experts, data collectors, coders, and other staff at the ready, secondary quantitative analysis can be a cost efficient way to investigate consequential questions by taking advantage of others’ prior investment in study design, data collection, and database preparation. The following resources are especially useful to aspiring secondary quant researchers.
- Andersen, J. P., Prause, J., & Silver, R. C. (2011). A step-by-step guide to using secondary data for psychological research. Social and Personality Psychology Compass, 5(1), 56–75. doi:10.1111/j.1751-9004.2010.00329.x
- Trzesniewski, K. H., Donnellan, M. B., & Lucas, R. E. (Eds.) (2011). Secondary data analysis: An introduction for psychologists. American Psychological Association.
- Else-Quest, N. M., & Hyde, J. S. (2016). Intersectionality in quantitative psychological research: I. Theoretical and epistemological issues. Psychology of Women Quarterly, 40(2), 155-170.
- Sullivan, A. L., Weeks, M., Kulkarni, T., & Nguyen, T. (2020). Large-scale secondary data analysis—Part 1: For researchers. NASP Communiqué, 48(5), 16-19
Many popular publicly available datasets are already formatted appropriately for cross sectional and longitudinal analyses. In addition to learning the quantitative approach (or collaborating with individuals well versed in the analyses you intend to conduct), it can be wise to familiarize yourself with methods to format or otherwise “clean” quantitative data. This point is especially salient if you are partnering with schools or other community agencies who may not necessarily be mindful of the requirements of data formatting for quantitative analyses. Although different software will have different requirements for data formatting, in general, you can differentiate between “long” format and “wide” format. In the latter each row represents outcomes from one unit of interest (e.g., school, student, etc.), whereas “long” format includes repeated rows from an individual unit. Some analytic approaches (e.g., SEM) often use “wide” data whereas others (e.g., “linear mixed effects regression) use “long” data. It is very likely that you may need to convert raw data from one format to another, such as changing data from wide to long and vice versa (e.g., Reshaping Data in R, Reshaping Data in SPSS, and Reshaping Data Long to Wide in SAS).
To learn more, there are a variety of archived webinars available that provide general information and training, such as this one by the American School Health Association or this handout and archive (scroll past the COVID webinars) from Maternal and Child Health Bureau of the U.S. Department of Health and Human Services for research on child health, autism, and other salient topics.
For quantitative researchers in particular, there are a variety of ways to find and access potential data sources, including:
- Pamela Davis-Green’s List of Secondary Datasets
- Inter-university Consortium for Political and Social Research (ICPSR) Find Data Tool
- National Surveys of Children’s Health
- Virginia Tech’s Data Resources For Social Science
- Google Dataset Search Engine
If you are interested in datasets available from the National Center for Education Statistics, the Distance Learning Dataset Training is a great way to familiarize yourself with specific longitudinal and cross-sectional datasets. Scholars using these datasets and others are also eligible for targeted funding opportunities like these from AERA or the HRSA Autism Secondary Data Analysis Research Program, as well as being eligible for general funding opportunities from various agencies and foundations.
As you get started, keep these tips in mind:
- Take time to learn about secondary data analysis.
- As you hone in on a potential data source, be sure to read any technical materials, methods reports, and previous research using the data. Take time to really get to know the data so that you aren’t misusing it.
- Leave time in your research plans for completing necessary procedures to access the data (e.g., application processes, security requirements, university approval) and secure IRB approval of your project (if applicable; when in doubt, ask your IRB).
- Document everything you do in the process (e.g., any data manipulation) and keep backups of everything. (Many an analyst has wept over failure to document something key or to save prior code or data files after realizing they needed to backup a few steps to change course).
- Don’t be afraid to consult with the original researchers or methodologists if you have questions about the data source.
What questions or concerns do you have for getting started with these approaches? If you’re already doing this work, what professional learning materials and opportunities have you found helpful?