NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: ED677716
Record Type: Non-Journal
Publication Date: 2025-Oct-10
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Scale Invariance in Boundary Average Treatment Effects from Multidimensional Regression Discontinuity Designs
Lily An; Luke Miratrix; Zach Branson
Society for Research on Educational Effectiveness
Background: Educational programs often use student test scores to determine access to some treatment, such as remedial support or graduation (Jacob & Lefgren, 2004; Martorell, 2004; Matsudaira, 2008; Papay et al., 2011, 2014). In these cases, treatment assignment is based on the student's score from one or more subjects. For example, students may be assigned to summer school if their math and English scores fall below predetermined thresholds. Multidimensional regression discontinuity designs (RDDs), a quasi-experimental method for conducting causal inference, help estimate the causal effect of assignment to treatment based on multiple thresholds (Wong et al., 2013). With a single running variable, e.g. a math score, local linear regression is used to estimate a local average treatment effect (LATE) at the singular cutoff. With multiple running variables, the cutoffs form boundaries where assignment to treatment occurs on one side. We describe this causal estimand as the BATE, or boundary average treatment effect, as indicated using this study's empirical data in Figure 1. Figure 1: An example of the two-dimensional RDD we consider in this paper. The cutoffs are c1 = 0 along Running Variable 1 and c2 = 0 along Running Variable 2. These running variables relate to ACCESS test scores from English Language Learner students. The treated region of students who are eligible for reclassification is in the top right quadrant and the control region is the region to the left and below the boundary defined by the black lines. The causal estimand is the BATE, which is estimated along the boundary. To estimate the BATE in such RDDs one can either project the two-dimensional problem to a single dimensional one (this can be done with so-called "binding scores" or by using a pooled frontier approach, which is the weighted average of two one-dimensional RDDs) or use response surface modeling, which gives an estimate of impacts along the entire boundary. One approach to response surface modeling is to use a flexible nonparametric estimation approach such as Gaussian process regression; in related work we find this can yield lower bias, higher precision, and lower RMSEs than the dimension reducing approaches of binding scores or the pooled frontier. However, there are concerns about the ability of a single BATE estimate to fully capture the treatment effect of interest in these quasi-experimental designs. For example, when the running variables represent different constructs, it is unclear if and how aggregating these effects makes assumptions about the comparability of these effects. Wong et al. (2013) raised concerns about the scale (standard deviation) and metric (unit of measurement) of the running variables, claiming that when they differ, the BATE no longer represents a causal estimand of interest. The authors recommend focusing on frontier-specific treatment effects instead. We investigate the influence of scale invariance in BATEs; understanding the role of scale for the BATE can support both methodological advances as well as deeper connection between education policy and evaluation research. To the extent that BATEs are sensitive to different running variable scales, then this study will provide insight into when multidimensional RDDs are appropriate. If BATEs are not as sensitive as previously understood, understanding and disseminating this information can expand the opportunities for proper estimation of multiarmed treatments which are common in education research. Purpose and Research Questions: Using Gaussian process regression (GPR) to estimate the treatment and control nonparametric response surfaces, we first demonstrate gains over existing methods in terms of lower bias, higher precision, and lower RMSEs. However, a prevailing question is whether these gains are due to averaging across frontiers (boundaries determined by the running variables) that are sensitive to rescaling of the running variables, particularly when the treatment effects are heterogeneous (Wong et al., 2013). This is important to understand to contextualize the applicability of possible gains in treatment effect estimation from using GPR. In particular, the benefits of GPR would be of less interest if they are limited to multidimensional regression discontinuities where the running variables are on the same scale and metric. We therefore ask the following questions: (1) For the set of standard methods for 2DRDD, does rescaling, or using two running variables of different scales, introduce bias or additional uncertainty when estimating BATEs?; and (2) Does using GPR with a length scale for each running variable improve, hurt, or maintain this possible problem? Research Design: First, we demonstrate how GPR yields gains in bias, precision, and RMSE across various types of data generating processes in simulation. To calculate the treatment effect using GPR, we follow the procedure in Rischard et al. (2021): (1) Weight sections of the boundary (sentinels) by the number of datapoints within a certain radius; (2) Fit GPR models to both sides of the boundary using the laGP function in R, with a squared exponential covariance function; (3) Extrapolate the resulting surfaces to the boundary; and (4) Calculate the BATE as the weighted average of the difference between the extrapolations at each sentinel along the border. We use both density weights and precision weights. Second, and more importantly for this proposal, we explore the robustness of GPR simulation results to rescaling. We allow for the use of separate lengthscales per Gaussian process using anisotropic kernels, which have multiple lengthscales for each dimension. Data and Analysis: First, we simulate an experimental design in which students are assigned to one intervention based on two distinct sets of test scores. We generate two normally distributed scores in 500 simulated datasets each with n=1,000 units (students). Treatment for each student is assigned if both scores are below a prespecified cutoff. We match Porter et al.'s simulation cutoff and correlation combinations (2017). We compare simulation results from GPR to two of the multidimensional RDD methods discussed in Porter et al. (2017). We examine the methods' ability to capture the true BATE. Second, to study rescaling we generate data where one running variable arm truly has no treatment effect, and the other has an increasing positive treatment effect. We rescale these datasets by multiplying one running variable by 100. We then compare BATE estimates from using GPR with original and rescaled data. We also examine whether rescaling affects BATE estimates in empirical data. Results: Figure 2 presents our first simulation findings, which show that density-weighted GPR results outperform binding score, pooled frontier, and loess across different data generating models of varying complexity, different running variable cutoffs, and different correlations between running variables. When we rescale one running variable in a two-dimensional RDD, we do not see significant differences in BATEs over simulation runs (t=0.44, p=0.67 for density weights, t=1.63 p=0.12 for precision weights). Figure 3 plots the density-weighted BATEs and rescaled BATEs against each other for a pilot of 10 simulation trials. We also use the original ACCESS data (shown in Figure A) and rescale one running variable. With two lengthscales for GPR, we do not see significantly different results for the ELL reclassification policy in question between the original data and the rescaled data across multiple test score outcomes. This is true even with precision-weighted estimates. Conclusion: The use of multiple score cutoffs is common in the educational evaluation literature, and discussion of the full (boundary) ATE merits further attention. As users of GPR for multidimensional RDDs, we are concerned that the BATE is sensitive to the scale of the running variables and therefore arbitrary. We present evidence here that rescaling does not change BATE estimates significantly, in both simulated and empirical data. This may relate to flexibility in GPR's hyperparameters. However, this analysis may encourage researchers to examine treatment effect heterogeneity regardless. Future work includes an expansion of a specific comparison between the pooled frontier and GPR approaches.
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A
Author Affiliations: N/A