Validity as Process: A Construct Driven Measure of Fidelity of Implementation.

Jones, Ryan Seth

Estimates of fidelity of implementation are essential to interpret the effects of educational interventions in randomized controlled trials (RCTs). While random assignment protects against many threats to validity, and therefore provides the best approximation to a true counterfactual condition, it does not ensure that the treatment condition faithfully reflects researchers' intentions (Lipsey, 1993). If program success relies upon changes in practices, routines, or behaviors of participants, which is the case in most K-12 education interventions, then causal inferences about success or failure rely on a valid account of the various states of the program in treatment conditions. Although fidelity measures are widely considered a necessary component of field experiments there is no consensus on how one should go about this important work (e.g. Cordray & Pion, 2006; O'Donnell, 2008). Many agree that measures should be grounded in an a-priori program theory (O'Donnell, 2008), but there is very little direction on what constitutes a sufficient theory (Donalson & Lipsey, 2006). As a result, researchers rarely articulate the theories that guide the construction of their fidelity measures, and the few that do provide very general descriptions that are open to multiple interpretations (e.g. Nelson et. al, 2012). Fidelity measures without clearly articulated program theories are at risk for the two biggest threats to validity: construct underrepresentation and construct irrelevant variance (Messick, 1994). It is no surprise then that validity evidence for fidelity measures is rarely reported (Hagermoser-Sanetti & Kratochwill, 2009), and when validity evidence is reported it is typically done so only at the instrument level. The author argues that fidelity measures should be construct driven to address these threats to validity (Messick, 1994). Construct driven measures use visible performance to make inferences about invisible theoretical constructs. In contrast, many fidelity measures are designed as task driven assessments. Although task driven measures are sometimes appropriate in other contexts, such as judging an athlete's performance in the Olympic games, the goal of measuring fidelity is not to simply describe behavior during a specific moment in time. Instead, fidelity measures use a sample of observational moments to make inferences about overall implementation. So, to establish a measure's validity researchers should clearly articulate three things: (1) latent program theory; (2) observable evidence of program theory; and (3) the nature of the correspondence between them. The author uses Wilson's (2005) four building blocks of measurement to guide the development of a fidelity measure in a randomized controlled trial of the middle school statistics curriculum, "Data Modeling". He writes here that he found this framework useful due to its emphasis on all three components of construct driven measurement. Guided by the four building blocks (Figure 1), this presentation illustrates the use of this framework by addressing the following questions: (1) What are the latent constructs that make up the Data Modeling program theory?; (2) What would constitute an "item" in the context of Data Modeling instruction?; (3) What are the likely outcomes from our observations, and how do they relate to our construct?; and (4) How do we model our measurements to provide inference about the construct? The author concludes that he has found much needed guidance from Wilson's (2005) framework in developing measures with ongoing and iterative attention to construct validity. Deliberately thinking about construct articulation, item design, outcome space, and measurement model produced meaningful validity evidence, and allowed him make principled refinements to increase the validity of the measure. A table and figure are appended.