NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: ED657184
Record Type: Non-Journal
Publication Date: 2021-Sep-28
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: N/A
Toward a Science of Failure Analysis
Claire Allen-Platt; Clara-Christina Gerstner; Robert Boruch; Alan Ruby
Society for Research on Educational Effectiveness
Background/Context: When a researcher tests an educational program, product, or policy in a randomized controlled trial (RCT) and detects a significant effect on an outcome, the intervention is usually classified as something that "works." When the expected effects are not found, however, there is seldom an orderly and transparent analysis of plausible reasons why. "Null findings" are often treated as an evidential dead-end; accumulating and learning from possible failure mechanisms is not standard practice in education research, and it is not common to design interventions with causes of failure in mind. The prevalence of null findings in education research is unknown. Estimates based on different samples vary from one-third to approximately one-half of educational experiments and nine out of 10 follow-up effectiveness trials specifically (Coalition for Evidence-Based Policy, 2013; H.C. Hill & Erickson, 2019; Jacob et al., 2019; Kim, 2019). Null results in the social sciences may be undercounted for many reasons, including reporting and publication biases that induce researchers and editors to only publish successful (i.e., significant) statistical outcomes (e.g., Bakker et al., 2012; Dawson & Dawson, 2016; Franco et al., 2014; Pigott et al., 2013; Rosenthal, 1979). Accounts of null findings and nonsuccesses in educational experiments are an under-exploited source of evidence for practitioners and researchers. Purpose/Objective/Research Question: Despite the commonness of RCTs with one or more null results, researchers seldom explore how we can design RCTs "a priori" so as to better learn from the inevitable failures to meet expectations nor how we can learn about plausible reasons for failure "post facto" in a scientific and orderly way. Boruch and Ruby (2015) and a special issue of "Educational Researcher" edited by Herrington and Maynard (2019) are distinct in the literature for interrogating the nature and consequence of null findings. We build on their work by analyzing a sample of recent large-scale RCTs with at least one null or negative major outcome. Researchers' accounts are synthesized to identify recurrent failure mechanisms in experiments, and implications for future cause-and-effect research are shared. Our purpose is to introduce a broad framework for thinking about interventions that do not produce expected effects in RCTs and seed a cumulative knowledge base on when, how and why interventions do not reach expectations. Research Design and Data Collection: Our study relied on two important sources of evaluation research, evaluation rms and peer-reviewed journals, that cover a restricted population of RCTs. We conducted a search in 2019 of eight firm websites, the What Works Clearinghouse website, and three academic peer-reviewed journals, to locate published reports of large-scale RCTs that took place in a K-12 school context during the last 10 years (2009-2019). The search yielded 68 total studies with at least one non-significant or negative major outcome, an assembly considered illustrative because there is no population listing from which we can take a probability sample. Analysis: We coded researchers' reported reasons for null findings and synthesized the results to identify failure mechanisms that might be addressed during planning, analysis, or other stages of research. We identified four broad types of events that led to null findings in school-based interventions. These categorizations emerged entirely from researchers' stated reasons for or speculations why the intervention did not yield statistically significant results. Findings/Results: Most studies in this assembly were cluster-randomized trials that randomly allocated schools to either an intervention or control condition. Interventions comprised instructional materials, classroom technologies, in- and after-school programs, and school policies. Approximately four of five studies (83%) used an academic measure as the main outcome of interest, with an average of five outcomes per study. We identified four broad types of events that led to null findings in school-based interventions: issues with planning and theory of change (59% of studies), implementation constraints and errors (86%), instability or attrition in the population of participants (41%), and measurement issues (43%). The full list of reasons for reported null findings within each event type appears in Table 1. This typology occurs along two dimensions: issues related to the design of the "trial" (e.g., threats to random assignment; attrition and other forms of missing data) and issues related to the design of the "intervention" (e.g., no single theory of change; goals that conflict between the intervention and the context; teacher or student dissatisfaction with the program or policy). Several implications for researchers, evaluators, and practitioners emerged from the review. Existing literature suggests ways researchers can guard against null results, including clearly articulating an intervention's logic model; employing a small number of outcome measures; acknowledging that realistic effect sizes in school settings are small; and collecting more process or implementation data than convention stipulates (e.g., C.J. Hill, 2019; H.C. Hill & Erickson, 2019; Jacob et al., 2019; Kim, 2019; Valentine, 2019). Our review gave rise to additional insights. For example, during a trial's design phase, researchers are advised to attend to sampling design, power analysis, construct validity and related measurement issues. But researcher guidance seldom mentions the importance of a comprehensive theory of change nor suggests engaging teachers in developing a theory of change they understand and support for their student population. Other opportunities for evaluators to work as collaborators are shared. We also find that issues addressed during analysis could be preempted through design. For example, participant attrition is commonly classified as a limitation of an experiment post-hoc. Recent work to develop empirical benchmarks for student attrition during different grade transitions using nationally representative, longitudinal surveys (Rickles et al., 2018) and probability estimates of teacher attrition (Taylor & West, 2020) suggest that researchers can anticipate attrition as a known factor, incorporate attrition estimates in power analyses, and define success more realistically, at the outset of a trial. Conclusions: Teachers in schools know that learners can greatly benefit from making mistakes ("productive failure," in classroom parlance). Education researchers are encouraged to actively learn from experimental studies that do not yield expected effects by conducting post-mortems and attending to common failure mechanisms in the design, implementation, and analysis stages of their work.
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A
Author Affiliations: N/A