ERIC Number: ED675588
Record Type: Non-Journal
Publication Date: 2025
Pages: 13
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Improving the Generalizability of Models of Collaborative Discourse
Chelsea Chandler; Rohit Raju; Jason G. Reitman; William R. Penuel; Monica Ko; Jeffrey B. Bush; Quentin Biddy; Sidney K. D’Mello
International Educational Data Mining Society, Paper presented at the International Conference on Educational Data Mining (EDM) (18th, Palermo, Italy, Jul 20-23, 2025)
We investigated methods to enhance the generalizability of large language models (LLMs) designed to classify dimensions of collaborative discourse during small group work. Our research utilized five diverse datasets that spanned various grade levels, demographic groups, collaboration settings, and curriculum units. We explored different model training techniques with RoBERTa and Mistral LLMs, including traditional fine-tuning, data augmentation paired with fine-tuning, and prompting. Our findings revealed that traditionally fine-tuning RoBERTa on a single dataset (serving as our baseline) led to overfitting, with the model failing to generalize beyond the training data's specific curriculum and language patterns. In contrast, fine-tuning RoBERTa with embedding augmented data led to significant improvements in generalization, as did pairing Mistral embeddings with a support vector machine classifier. However, fine-tuning and few-shot prompting Mistral did not yield similar improvements. Our findings highlight scalable alternatives to the resource-intensive process of curating labeled datasets for each new application, offering practical strategies to enhance model adaptability in diverse educational settings. [For the complete proceedings, see ED675583.]
Descriptors: Artificial Intelligence, Models, Natural Language Processing, Discourse Analysis, Classification, Generalization, Middle School Students, College Students, Cooperative Learning
International Educational Data Mining Society. e-mail: admin@educationaldatamining.org; Web site: https://educationaldatamining.org/conferences/
Publication Type: Speeches/Meeting Papers; Reports - Research
Education Level: Junior High Schools; Middle Schools; Secondary Education; Higher Education; Postsecondary Education
Audience: N/A
Language: English
Authoring Institution: N/A
Grant or Contract Numbers: 2019805
Author Affiliations: N/A

Peer reviewed
