Power Calculations for Randomized Controlled Trials with Auxiliary Observational Data.

Jaylin Lowe; Charlotte Z. Mann; Jiaying Wang; Adam Sales; Johann A. Gagnon-Bartsch

Notes FAQ Contact Us

Back to results

Peer reviewed

Direct link

ERIC Number: ED670830

Record Type: Non-Journal

Publication Date: 2024-Jul

Pages: 7

Abstractor: As Provided

ISBN: N/A

ISSN: N/A

EISSN: N/A

Available Date: 0000-00-00

Power Calculations for Randomized Controlled Trials with Auxiliary Observational Data

Jaylin Lowe¹; Charlotte Z. Mann¹; Jiaying Wang²; Adam Sales³; Johann A. Gagnon-Bartsch¹

Grantee Submission, Paper presented at the International Conference on Educational Data Mining (17th, Atlanta, GA, Jul 2024)

Recent methods have sought to improve precision in randomized controlled trials (RCTs) by utilizing data from large observational datasets for covariate adjustment. For example, consider an RCT aimed at evaluating a new algebra curriculum, in which a few dozen schools are randomly assigned to treatment (new curriculum) or control (standard curriculum), and are evaluated according to subsequent scores on a state standardized test. Suppose that in addition to the RCT data, standardized test scores are also publicly available for all other schools in the state. Although not part of the RCT, these observational test scores could be used to increase precision in the RCT. Specifically, an outcome prediction model can be trained on the auxiliary data and the resulting predictions can be used as an additional covariate. With these methods, the desired power is often achieved with a smaller RCT. The necessary sample size depends on how well a model trained on the observational data generalizes to the RCT, which is typically unknown. We discuss methods for obtaining a range of reasonable sample sizes for designing such an RCT, using an efficacy trial for the Cognitive Tutor Algebra I curriculum as an example. The range is created by dividing the observational data into subgroups, and calculating the necessary sample size if the RCT sample were to resemble each subgroup. These subgroups can be defined by covariate values or by how well the observational data is expected to help. In this way, we are able to generate a range of plausible sample sizes. Computational efficiency is a potential concern for our computation of auxiliary predictions, and we show how this issue can be addressed more efficiently without significantly affecting the results. [This paper was published in: "Proceedings of the 17th International Conference on Educational Data Mining," edited by B. Paaßen and C. D. Epp, International Educational Data Mining Society, 2024, pp. 469-75.]

Publication Type: Speeches/Meeting Papers; Reports - Research

Education Level: Junior High Schools; Middle Schools; Secondary Education; High Schools

Audience: N/A

Language: English

Sponsor: Institute of Education Sciences (ED)

Authoring Institution: N/A

Identifiers - Location: Texas

IES Funded: Yes

Grant or Contract Numbers: R305D210031

Department of Education Funded: Yes

Author Affiliations: ¹University of Michigan, Department of Statistics, Ann Arbor, MI; ²University of Southern California, Viterbi School of Engineering, Los Angeles, CA; ³Worcester Polytechnic Institute, Mathematical Sciences Department, Worcester, MA