ERIC Number: ED677653
Record Type: Non-Journal
Publication Date: 2025-Oct-10
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Ninth Grade Predictors of High School Graduation: A Comparative Analysis of Modeling Approaches
Meril Antony; Krishna Swaroop Pamidimukkala; Mengyuan Ling; Vandeen A. Campbell
Society for Research on Educational Effectiveness
Background: As urban school districts have increasingly adopted "early warning indicators" such as the ninth-grade on-track measure to identify students who may need additional support, it is crucial to assess the effectiveness of this measure in predicting high school graduation. While there is considerable research on these indicators' ability to predict high school dropout (Davis et al., 2019; Mac Iver & Mac Iver, 2009), less attention has been given to their role in predicting graduation. In 2022, 2.1 million students nationwide did not graduate high school (NCES, 2024), with a widening achievement gap between affluent and low-income urban districts (Matheny et al., 2023). Much of the existing research has focused on Chicago Public Schools (CPS) and New York City (Elwick, 2017). Allensworth and Easton (2007) pivotal study noted that ninth-grade on-track students are four times more likely to graduate within four years than their off-track peers. Similarly, Roderick et al. (2014) found that freshmen who fail a course during their first semester are at a significantly higher risk of dropping out. Additionally, research suggests that focusing on ninth-grade on-track measures could also improve long-term outcomes, such as college enrollment and retention (Easton et al., 2017). Given this, early warning systems that track ninth-grade on-track measures are essential for preventing dropouts and improving graduation rates. This study assesses the effectiveness of the ninth grade on-track measure as a predictor of high school graduation in an urban school district located in the northeast state in the United States. The study explores the relationship between ninth grade academic and behavioral metrics, and high school graduation to determine key predictors in the pre and post COVID-19 context. By evaluating the predictive value of ninth-grade on-track status using multiple predictive modeling approaches and cohort-level administrative data from 2012 to 2021, this research contributes to the literature on early warning indicators and enhances understanding of their reliability in diverse urban settings. Furthermore, this research contributes to the existing body of knowledge on graduation predictors by focusing on a northeastern urban school district with a predominantly Black and Latinx student population (over 90% of ninth graders in 2021-2022). While much of the prior research has concentrated on districts in Chicago, Philadelphia, and New York, this study provides insights from a district actively engaged in initiatives aimed at improving students' on-track status by the end of freshman year. Research Questions: A) Is ninth grade on-track a significant predictor of high school graduation, controlling for other factors? B) What other academic or contextual indicators are most predictive of high school graduation and how do they vary with another, compared to the on-track measure? and C) Is there a difference in the predictive power of ninth grade on-track measure pre and post COVID-19? Data Sources: The study analyzed administrative data from a an urban school district located in the northeast state in the United States, including student's enrollment records (school type and grade), background information (race/ethnicity, gender, neighborhood ward as proxy for their poverty level, special education status, free or reduced price lunch, English language status), course performance indicators for eight and ninth grade years, measure of ninth-grade on-track and high school graduation status (outcome of interest). In this study, we analyzed 29,000 students who entered grade 9 from 2012 through 2021. Table 1 presents the student characteristics of ninth grade cohorts. Analysis: In this study, we combine different methodological approaches to identify key predictors of high school graduation. More specifically, we include not just the traditional explanatory research methods (stepwise regression analysis) but also different predictive modelling techniques (e.g. ensemble method; splitting and resampling method) that have evolved to not only handle large, non-linear datasets but also are known to improve predictive accuracy with each new identified patterns (Ghorbani et al., 2020). Predictive modeling forecasts future outcomes or probabilities for individuals by analyzing data from similar cases with known results (Porter & Balu, 2006; Bird et al., 2021). This technique, widely used in business and marketing, is increasingly being adopted in education research (Musso et al., 2020), especially around building policy from specific interventions. As such, our analysis plan includes three models for predicting on-time graduation. Model 1 uses a traditional stepwise regression approach to identify key predictors from various categories--academic, on-track, student-level, and school-level indicators. It aims to assess the predictive power and yield of these variables, with a focus on those that show high significance in predicting on-time graduation, controlling for other factors (Marshall, 2022; Balfanz et al., 2007). Model 2 applies a splitting and resampling method to assess model performance using logistic regression with a penalty to minimize irrelevant predictors. It builds two models, one including ninth-grade on-track measures and one without, and evaluates their performance by splitting the data into training and testing sets. Model 3 integrates an ensemble method used in the Non-Linear Models (NLM) approach, combining multiple model outputs to improve prediction accuracy and robustness. Ensemble methods reduce bias by combining multiple models, which allows for more complex decision-making that better captures the underlying patterns in the data. This is especially true in methods like boosting, where models are added sequentially to correct previous errors. They also reduce variance by averaging the predictions of multiple models, which helps smooth out individual model overfitting. Preliminary analysis (RQA) involved data splitting and resampling (Model 2) with complete cases. A generalized linear regression model was used to estimate logistic regression slope parameters, applying a penalty to reduce less relevant predictors. Two models were built: one with and one without the ninth-grade on-track measure. The data was split into training and testing sets to evaluate model performance. The model was trained on the training data and evaluated on the test data using various metrics. The results showed a slightly better fit with the ninth-grade measure, though the linear nature of the equation may limit effectiveness. In our later analyses, we will also be interested to run the models for different student groups and compare the results.
Descriptors: Grade 9, Predictor Variables, High School Students, Graduation, Urban Schools, School Districts, Educational Trends, African American Students, Hispanic American Students, Graduation Rate, Educational Indicators, COVID-19, Pandemics
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: Grade 9; High Schools; Junior High Schools; Middle Schools; Secondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A
Author Affiliations: N/A

Peer reviewed
Direct link
