Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?.

Xiaoming Zhai; Matthew Nyaaba; Wenchao Ma

Notes FAQ Contact Us

Back to results

Peer reviewed

Direct link

ERIC Number: EJ1464701

Record Type: Journal

Publication Date: 2025-Apr

Pages: 22

Abstractor: As Provided

ISBN: N/A

ISSN: ISSN-0926-7220

EISSN: EISSN-1573-1901

Available Date: 2024-01-29

Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?

Xiaoming Zhai^1,2; Matthew Nyaaba^1,3; Wenchao Ma⁴

Science & Education, v34 n2 p649-670 2025

This study aimed to examine an assumption regarding whether generative artificial intelligence (GAI) tools can overcome the cognitive intensity that humans suffer when solving problems. We examine the performance of ChatGPT and GPT-4 on NAEP science assessments and compare their performance to students by cognitive demands of the items. Fifty-four 2019 NAEP science assessment tasks were coded by content experts using a two-dimensional cognitive load framework, including task cognitive complexity and dimensionality. ChatGPT and GPT-4 answered the questions individually and were scored using the scoring keys provided by NAEP. The analysis of the available data for this study was based on the average student ability scores for students who answered each item correctly and the percentage of students who responded to individual items. The results showed that both ChatGPT and GPT-4 consistently outperformed most students who answered each individual item in the NAEP science assessments. As the cognitive demand for NAEP science assessments increases, statistically higher average student ability scores are required to correctly address the questions. This pattern was observed for Grades 4, 8, and 12 students respectively. However, ChatGPT and GPT-4 were not statistically sensitive to the increase of cognitive demands of the tasks, except for Grade 4. As the first study focusing on comparing cutting-edge GAI and K-12 students in problem-solving in science, this finding implies the need for changes to educational objectives to prepare students with competence to work with GAI tools such as ChatGPT and GPT-4 in the future. Education ought to emphasize the cultivation of advanced cognitive skills rather than depending solely on tasks that demand cognitive intensity. This approach would foster critical thinking, analytical skills, and the application of knowledge in novel contexts among students. Furthermore, the findings suggest that researchers should innovate assessment practices by moving away from cognitive intensity tasks toward creativity and analytical skills to more efficiently avoid the negative effects of GAI on testing.

Descriptors: Artificial Intelligence, National Competency Tests, Elementary Secondary Education, Problem Solving, Science Achievement, Cognitive Processes, Difficulty Level, Comparative Analysis, Influence of Technology

Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link-springer-com.bibliotheek.ehb.be/

Publication Type: Journal Articles; Reports - Research

Education Level: Elementary Secondary Education

Audience: N/A

Language: English

Sponsor: National Science Foundation (NSF)

Authoring Institution: N/A

Identifiers - Assessments and Surveys: National Assessment of Educational Progress

Grant or Contract Numbers: 2101104; 2138854

Author Affiliations: ¹University of Georgia, AI4STEM Education Center, Athens, USA; ²University of Georgia, Department of Mathematics, Science, and Social Studies Education, Athens, USA; ³University of Georgia, Department of Educational Theory and Practice, Athens, USA; ⁴University of Alabama, College of Education, Tuscaloosa, USA