ERIC Number: EJ1465703
Record Type: Journal
Publication Date: 2025
Pages: 15
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: EISSN-1929-7750
Available Date: 0000-00-00
Learning to Love LLMs for Answer Interpretation: Chain-of-Thought Prompting and the AMMORE Dataset
Owen Henkel; Hannah Horne-Robinson; Maria Dyshel; Greg Thompson; Ralph Abboud; Nabil Al Nahin Ch; Baptiste Moreau-Pernet; Kirk Vanacore
Journal of Learning Analytics, v12 n1 p50-64 2025
This paper introduces AMMORE, a new dataset of 53,000 math open-response question-answer pairs from Rori, a mathematics learning platform used by middle and high school students in several African countries. Using this dataset, we conducted two experiments to evaluate the use of large language models (LLM) for grading particularly challenging student answers. In experiment 1, we use a variety of LLM-driven approaches, including zero-shot, fewshot, and chain-of-thought prompting, to grade the 1% of student answers that a rule-based classifier fails to grade accurately. We find that the best-performing approach -- chain-of-thought prompting -- accurately scored 97% of these edge cases, effectively boosting the overall accuracy of the grading from 96% to 97%. In experiment 2, we aim to better understand the consequential validity of the improved grading accuracy by passing grades generated by the best-performing LLM-based approach to a Bayesian Knowledge Tracing (BKT) model, which estimated student mastery of specific lessons. We find that modest improvements in model accuracy can lead to significant changes in mastery estimation. Where the rule-based classifier misclassified the mastery status of 6.9% of students across completed lessons, using the LLM chain-of-thought approach reduced this to 2.6%. These findings suggest that LLMs could be valuable for grading fill-in questions in mathematics education, potentially enabling wider adoption of open-response questions in learning systems.
Descriptors: Learning Analytics, Learning Management Systems, Mathematics Instruction, Middle School Students, High School Students, Computational Linguistics, Grading, Test Items, Mathematics Tests, Artificial Intelligence, Computer Software, Bayesian Statistics, Classification, Foreign Countries, Cues
Society for Learning Analytics Research. 121 Pointe Marsan, Beaumont, AB T4X 0A2, Canada. Tel: +61-429-920-838; e-mail: info@solaresearch.org; Web site: https://learning-analytics.info/index.php/JLA/index
Publication Type: Journal Articles; Reports - Research
Education Level: Junior High Schools; Middle Schools; Secondary Education; High Schools
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: Nigeria; South Africa; Ghana; Africa
Grant or Contract Numbers: N/A
Author Affiliations: N/A