ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	18
Since 2017 (last 10 years)	31
Since 2007 (last 20 years)	53

Descriptor

Comparative Analysis	60
Computer Software	60
Reliability	39
Foreign Countries	26
Evaluation Methods	19
Interrater Reliability	18
Second Language Learning	16
Evaluators	15
Validity	15
Correlation	14
English (Second Language)	13
Scores	13
Artificial Intelligence	12
Computational Linguistics	12
Scoring	12
Essays	11
Computer Assisted Testing	10
Second Language Instruction	10
Statistical Analysis	10
Teaching Methods	10
Writing Evaluation	10
Accuracy	9
Instructional Effectiveness	9
Educational Technology	8
Models	8
More ▼

Publication Type

Journal Articles	49
Reports - Research	44
Reports - Evaluative	6
Reports - Descriptive	5
Speeches/Meeting Papers	5
Tests/Questionnaires	4
Dissertations/Theses -…	3
Book/Product Reviews	2
Collected Works - Proceedings	1
Information Analyses	1

Education Level

Higher Education	20
Postsecondary Education	19
Secondary Education	12
Elementary Education	6
Elementary Secondary Education	3
High Schools	3
Middle Schools	3
Grade 4	2
Intermediate Grades	2
Junior High Schools	2
Grade 5	1
Grade 7	1
Preschool Education	1
More ▼

Audience

Location

China	3
Germany	3
Singapore	3
Egypt	2
Indonesia	2
Iran	2
Pakistan	2
Philippines	2
South Korea	2
United Kingdom	2
Arizona	1
Asia	1
Australia	1
Brazil	1
Connecticut	1
Denmark	1
Estonia	1
Europe	1
Florida	1
Greece	1
Hawaii	1
Hong Kong	1
India	1
Ireland	1
Israel	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Dale Chall Readability Formula	1
Expressive One Word Picture…	1
Flesch Kincaid Grade Level…	1
Flesch Reading Ease Formula	1
Fry Readability Formula	1
Graduate Record Examinations	1
International English…	1
Mean Length of Utterance	1
Peabody Picture Vocabulary…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 60 results Save | Export

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans

Peer reviewed

Direct link

Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024

RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…

Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics

Can Large Language Models Replace Humans in Systematic Reviews? Evaluating GPT-4's Efficacy in Screening and Extracting Data from Peer-Reviewed and Grey Literature in Multiple Languages

Peer reviewed

Direct link

Qusai Khraisha; Sophie Put; Johanna Kappenberg; Azza Warraitch; Kristin Hadfield – Research Synthesis Methods, 2024

Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained…

Descriptors: Peer Evaluation, Research Reports, Artificial Intelligence, Computer Software

Frequentist and Bayesian Factorial Invariance Using R

Peer reviewed
PDF on ERIC

Download full text

Teck Kiang Tan – Practical Assessment, Research & Evaluation, 2024

The procedures of carrying out factorial invariance to validate a construct were well developed to ensure the reliability of the construct that can be used across groups for comparison and analysis, yet mainly restricted to the frequentist approach. This motivates an update to incorporate the growing Bayesian approach for carrying out the Bayesian…

Descriptors: Bayesian Statistics, Factor Analysis, Programming Languages, Reliability

Coherence-Based Automatic Short Answer Scoring Using Sentence Embedding

Peer reviewed

Direct link

Dadi Ramesh; Suresh Kumar Sanampudi – European Journal of Education, 2024

Automatic essay scoring (AES) is an essential educational application in natural language processing. This automated process will alleviate the burden by increasing the reliability and consistency of the assessment. With the advances in text embedding libraries and neural network models, AES systems achieved good results in terms of accuracy.…

Descriptors: Scoring, Essays, Writing Evaluation, Memory

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Evaluating Large Language Models in Analysing Classroom Dialogue

Peer reviewed

Direct link

Yun Long; Haifeng Luo; Yu Zhang – npj Science of Learning, 2024

This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue--a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using…

Descriptors: Classroom Communication, Computational Linguistics, Chinese, Mathematics Instruction

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

How Do Judges in Comparative Judgement Exercises Make Their Judgements?

Download full text

Leech, Tony; Chambers, Lucy – Research Matters, 2022

Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…

Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating

Peer reviewed

Direct link

Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022

The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…

Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability

Curating Cyberbullying Datasets: A Human-AI Collaborative Approach

Peer reviewed

Direct link

Christopher E. Gomez; Marcelo O. Sztainberg; Rachel E. Trana – International Journal of Bullying Prevention, 2022

Cyberbullying is the use of digital communication tools and spaces to inflict physical, mental, or emotional distress. This serious form of aggression is frequently targeted at, but not limited to, vulnerable populations. A common problem when creating machine learning models to identify cyberbullying is the availability of accurately annotated,…

Descriptors: Video Technology, Computer Software, Computer Mediated Communication, Bullying

Impacts of ChatGPT-Assisted Writing for EFL English Majors: Feasibility and Challenges

Peer reviewed

Direct link

Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024

To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…

Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

ProQuest LLC	3
Research Synthesis Methods	3
ETS Research Report Series	2
International Educational…	2
Language Testing	2
Malaysian Online Journal of…	2
Online Submission	2
Research-publishing.net	2
ALT-J: Research in Learning…	1
Advances in Language and…	1
Advances in Physiology…	1
American Educational Research…	1
Assessment in Education:…	1
Australasian Journal of…	1
Behavior Modification	1
British Journal of…	1
Database	1
Education and Information…	1
Educational and Psychological…	1
English Language Teaching	1
English Teaching	1
European Journal of Education	1
Grantee Submission	1
International Association for…	1
International Education…	1
More ▼

Lenhard, Wolfgang	2
Mott, Michael S.	2
Seifried, Eva	2
Spinath, Birgit	2
Ahmed, Tamim	1
Akbari, Alireza	1
Alsree, Zubaida	1
Alt, Mary	1
Amanda Huee-Ping Wong	1
Armijo-Olivo, Susan	1
Attali, Yigal	1
Aziz, Anealka	1
Azza Warraitch	1
Bae, Jiyoung	1
Baier, Herbert	1
Balogun, Sherifat Adepeju	1
Baratchian, Taher	1
Barribal, Jemie	1
Beaubien, Denise M.	1
Berry, Kenneth J.	1
Botarleanu, Robert-Mihai	1
Breyer, F. Jay	1
Brossart, Daniel F.	1
Burk, John	1
Bush, Sarah B.	1
More ▼