ERIC Number: ED670830
Record Type: Non-Journal
Publication Date: 2024-Jul
Pages: 7
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Power Calculations for Randomized Controlled Trials with Auxiliary Observational Data
Jaylin Lowe1; Charlotte Z. Mann1; Jiaying Wang2; Adam Sales3; Johann A. Gagnon-Bartsch1
Grantee Submission, Paper presented at the International Conference on Educational Data Mining (17th, Atlanta, GA, Jul 2024)
Recent methods have sought to improve precision in randomized controlled trials (RCTs) by utilizing data from large observational datasets for covariate adjustment. For example, consider an RCT aimed at evaluating a new algebra curriculum, in which a few dozen schools are randomly assigned to treatment (new curriculum) or control (standard curriculum), and are evaluated according to subsequent scores on a state standardized test. Suppose that in addition to the RCT data, standardized test scores are also publicly available for all other schools in the state. Although not part of the RCT, these observational test scores could be used to increase precision in the RCT. Specifically, an outcome prediction model can be trained on the auxiliary data and the resulting predictions can be used as an additional covariate. With these methods, the desired power is often achieved with a smaller RCT. The necessary sample size depends on how well a model trained on the observational data generalizes to the RCT, which is typically unknown. We discuss methods for obtaining a range of reasonable sample sizes for designing such an RCT, using an efficacy trial for the Cognitive Tutor Algebra I curriculum as an example. The range is created by dividing the observational data into subgroups, and calculating the necessary sample size if the RCT sample were to resemble each subgroup. These subgroups can be defined by covariate values or by how well the observational data is expected to help. In this way, we are able to generate a range of plausible sample sizes. Computational efficiency is a potential concern for our computation of auxiliary predictions, and we show how this issue can be addressed more efficiently without significantly affecting the results. [This paper was published in: "Proceedings of the 17th International Conference on Educational Data Mining," edited by B. Paaßen and C. D. Epp, International Educational Data Mining Society, 2024, pp. 469-75.]
Descriptors: Randomized Controlled Trials, Middle School Mathematics, Middle School Students, Middle Schools, High School Students, High Schools, Algebra, Data Collection, Sample Size, Statistical Analysis, Robustness (Statistics), Data Interpretation, Mathematics Curriculum, Standardized Tests, Predictive Measurement, Predictor Variables, Predictive Validity, Generalization, Intelligent Tutoring Systems, Causal Models
Publication Type: Speeches/Meeting Papers; Reports - Research
Education Level: Junior High Schools; Middle Schools; Secondary Education; High Schools
Audience: N/A
Language: English
Sponsor: Institute of Education Sciences (ED)
Authoring Institution: N/A
Identifiers - Location: Texas
IES Funded: Yes
Grant or Contract Numbers: R305D210031
Department of Education Funded: Yes