ERIC Number: EJ1492556
Record Type: Journal
Publication Date: 2025
Pages: 16
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: EISSN-2469-9896
Available Date: 0000-00-00
Can Large Language Models Correctly Interpret Equations with Errors?
Physical Review Physics Education Research, v21 n2 Article 020155 2025
This paper explores the potential of large language models to accurately extract and translate equations from typed student responses into a standard format. This is a useful task as standardized equations can be graded reliably using a computer algebra system or a satisfiability modulo theories solver. Therefore physics instructors interested in automated grading would not need to rely on the mathematical reasoning capabilities of language models. We used two novel frameworks to improve the translations. The first is consensus where a pair of models verify the correctness of the translations. The second is a neurosymbolic LLM-modulo approach were models receive feedback from an automated reasoning tool. We performed experiments using responses to the Australian Physics Olympiad exam. We report on results, finding that no open-source model was able to translate the student responses at the desired level of accuracy. Future work could involve breaking the task into smaller components before parsing to improve performance or generalizing the experiments to translate hand-written responses.
Descriptors: Artificial Intelligence, Computer Uses in Education, Physics, Grading, Automation, Science Tests, Foreign Countries, Equations (Mathematics)
American Physical Society. One Physics Ellipse 4th Floor, College Park, MD 20740-3844. Tel: 301-209-3200; Fax: 301-209-0865; e-mail: assocpub@aps.org; Web site: https://journals.aps.org/prper/
Publication Type: Journal Articles; Reports - Research; Tests/Questionnaires
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: Australia
Grant or Contract Numbers: N/A
Author Affiliations: N/A

Peer reviewed
Direct link
