Leveraging Modern Machine Learning to Reduce Chronic Absenteeism in Early Childhood.

Tiffany Wu; Christina Weiland

Background/Context: Chronic absenteeism is a serious problem that has been linked to lower academic achievement, diminished socioemotional skills, and an increased likelihood of high school dropout (Allensworth et al., 2021; Gottfried, 2014). As a result, many schools have begun to embrace early warning systems (EWS) as a tool to identify and flag students at risk of chronic absenteeism (Balfanz & Byrnes, 2019). However, despite its promise, current EWSs suffer from a few weaknesses. First, many EWSs are reactive, beginning when the school year begins, which means that students need to first miss 18 school days to trigger an intervention. Second, a majority of EWSs begin in high school (Sansone, 2019; Lee & Chung, 2019). Lastly, methodologically, EWSs rely on risk levels and flags calculated based on individual indicators being triggered or outputs from traditional logistic regression models which tend to have poor predictive accuracy (Deussen et al., 2017; Sansone, 2019). Machine learning holds the potential to become a formidable tool capable of addressing weaknesses of EWS that were previously beyond reach, including more accurately identifying students' individual risk levels for chronic absenteeism and providing a proactive risk level even for students in their earliest years of education. Purpose/Objective/Research Question: The present study demonstrates the efficacy of modern machine learning techniques, specifically the Synthetic Minority Oversampling Technique (SMOTE) and Extreme Gradient Boosting (XGBoost), in refining EWSs to proactively identify students who, without intervention, have a heightened risk of chronic absenteeism during their early schooling years. In particular, we answer the following research questions: How does the prediction accuracy from using modern machine learning algorithms (like SMOTE and XGBoost) compare to that from more traditional parametric methods (like logistic regression) for use as a proactive early warning system? How accurately and early can we identify students who will be chronically absent in 3rd grade? How can machine learning models be personalized to inform school and district policies regarding chronic absenteeism intervention while considering financial and resource constraints? We hypothesized the use of SMOTE and XGBoost would enhance predictive accuracy starting from PreK and that varying the probability thresholds could allow personalization of our findings for different stakeholders. Setting: This research took place in the Boston Public Schools (BPS) PreK program. The BPS PreK program is a large-scale early childhood education program based entirely in the public schools during our study years. It is open to any child in the city and has garnered a high profile in the past 15 years due to its attention to evidence-based practices (Kabay et al., 2020). Population/Participants/Subjects: Our sample for this paper is the population of students who enrolled in the Boston Public Schools (BPS) PreK program for four-year-olds between the 2007-2008 and 2010-2011 school years. Our final analytic sample includes 6,698 students. Sample descriptives are in Table 1 (Appendix). Our final sample was 51% male and racially diverse (28% Black, 42% Hispanic/Latino, 9% Asian, 3% multiracial or other). Almost half of the sample (44%) identified as a dual language learner, 71% was eligible for free or reduced-price lunch, and 17% were eligible for special education services. Of our sample, 632 students were chronically absent in 3rd grade and 6,066 were not. Intervention/Program/Practice: As discussed, the setting is the Boston Public Schools. This is methods-focused work; there is no intervention. Research Design: Synthetic Minority Over-sampling Technique (SMOTE; Chawla et al., 2002) is a resampling technique widely employed in machine learning to address class imbalance. SMOTE works by creating synthetic instances of the minority class (students who are chronically absent), which enhances a statistical model's ability to capture patterns separating the minority from the majority class. Figure 1 illustrates a hypothetical example of a 2-D feature space before and after SMOTE. XGBoost (Extreme Gradient Boosting; Chen & Guestrin, 2016) is a relatively new and popular machine learning algorithm for classification problems. XGBoost creates a sequence of simple classifier models (usually decision trees) that correct the mistakes of the models before it (Zhou, 2012). Figure 2 provides the loss function and illustration of how XGBoost works. Using both SMOTE and XGBoost in conjunction can improve the prediction accuracy of EWSs. Education interventions often involve resource allocation based on identified needs, and these methods can provide valuable insights into how to allocate resources more efficiently, getting supports to more students who are at risk of chronic absenteeism and wasting fewer resources on those incorrectly identified by current EWSs. Data Collection and Analysis: We obtained administrative data from the Massachusetts Department of Elementary and Secondary Education. We trained a total of 18 models across four different types of classifiers: multilevel logistic regression, multilevel logistic regression with SMOTE, XGBoost, and XGBoost with SMOTE. Models included data on time-invariant student characteristics (e.g., race, sex, dual language learner) as well as time-varying characteristics (e.g., attendance rate, school, number of suspensions). Findings/Results: As seen in Table 2, the top-performing XGBoost model (with SMOTE) outperformed the best multilevel logistic regression model in accurately forecasting chronic absenteeism among third-grade students. This XGBoost model had a recall rate of 0.627, a BER of 0.222, and an AUC of 0.891. Importantly, the XGBoost model accurately predicted 62.7% of the students who would be chronically absent in third grade whereas the best multilevel logistic regression only accurately predicted 31.7% of them. Table 3 shows how varying probability thresholds can help personalize models. For example, a district facing severe budget constraints may want to implement an intensive intervention and use a higher probability threshold of 0.8 or 0.9 to target students who are most likely to be chronically absent next year. Conclusions: The integration of the modern machine learning algorithms, namely XGBoost and SMOTE, could lead to a substantial increase in schools' ability to detect and mitigate chronic absenteeism during elementary school years. This study's findings have implications for helping schools and districts more accurately pinpoint the most appropriate students for targeted supports, along with implications for future education research grappling with the consequences of class imbalance and leveraging predictive modeling for outcomes or interventions beyond chronic absenteeism.