ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	10
Since 2017 (last 10 years)	24
Since 2007 (last 20 years)	50

Descriptor

Correlation	65
Evaluators	65
Interrater Reliability	43
Reliability	22
Scoring	22
Comparative Analysis	20
Scores	19
Second Language Learning	18
Foreign Countries	17
Writing Evaluation	16
English (Second Language)	15
Statistical Analysis	15
Evaluation Methods	14
Essays	13
Language Tests	12
Computer Assisted Testing	11
Rating Scales	11
Validity	11
Scoring Rubrics	9
Accuracy	8
Computational Linguistics	8
Computer Software	8
Decision Making	8
Language Proficiency	8
Second Language Instruction	7
More ▼

Publication Type

Journal Articles	58
Reports - Research	50
Reports - Evaluative	12
Tests/Questionnaires	7
Dissertations/Theses -…	2
Information Analyses	2
Speeches/Meeting Papers	2
Collected Works - Serials	1
Reports - Descriptive	1

Education Level

Higher Education	13
Postsecondary Education	11
Grade 6	4
Secondary Education	4
Grade 7	3
Elementary Education	2
Grade 8	2
Middle Schools	2
Elementary Secondary Education	1
Grade 1	1
Grade 11	1
Grade 3	1
Grade 4	1
Grade 5	1
High Schools	1
More ▼

Audience

Practitioners	1
Researchers	1

Location

California	3
United Kingdom	3
China	2
Hong Kong	2
Australia	1
Europe	1
Finland	1
Florida	1
India	1
Iran	1
Japan	1
Michigan	1
Nigeria	1
Ohio	1
Singapore	1
Texas	1
Turkey	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	4
Flesch Kincaid Grade Level…	1
International English…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 65 results Save | Export

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Perceptual and Acoustic Assessment of Strain Using Synthetically Modified Voice Samples

Peer reviewed

Direct link

Park, Yeonggwang; Cádiz, Manuel Díaz; Nagle, Kathleen F.; Stepp, Cara E. – Journal of Speech, Language, and Hearing Research, 2020

Purpose: Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Method: Stimuli were created using recordings of…

Descriptors: Acoustics, Audio Equipment, Auditory Perception, Correlation

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

The Whole Is More than the Sum of Its Parts -- Assessing Writing Using the Consensual Assessment Technique

Peer reviewed

Direct link

Zahn, Daniela; Canton, Ursula; Boyd, Victoria; Hamilton, Laura; Mamo, Josianne; McKay, Jane; Proudfoot, Linda; Telfer, Dickson; Williams, Kim; Wilson, Colin – Studies in Higher Education, 2021

Evaluating the impact of Academic Literacies teaching (Lea and Street [1998. "Student Writing in Higher Education: An Academic Literacies Approach." "Studies in Higher Education" 23 (2): 157-72. doi:10.1080/03075079812331380364]) is difficult, as it involves gauging whether writers: (1) gain better understanding of what…

Descriptors: Writing Evaluation, Evaluation Methods, Undergraduate Students, Foreign Countries

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

The Concurrent Validity of Comparative Judgement Outcomes Compared with Marks

Download full text

Gill, Tim – Research Matters, 2022

In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…

Descriptors: Comparative Analysis, Decision Making, Scripts, Standards

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Peer reviewed

Direct link

Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023

Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…

Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication

Validation of an Automated Procedure for Calculating Core Lexicon from Transcripts

Peer reviewed

Direct link

Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022

Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…

Descriptors: Validity, Discourse Analysis, Databases, Scoring

Inter-Rater Agreement for the Milestones and Barriers Assessments of the Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP)

Peer reviewed

Direct link

Montallana, Khrystle L.; Gard, Brendan M.; Lotfizadeh, Amin D.; Poling, Alan – Journal of Autism and Developmental Disorders, 2019

We determined inter-rater agreement for the VB-MAPP, an instrument sometimes used in planning educational goals and evaluating intervention effects for young people with autism. A pair of raters independently rated each of 32 children diagnosed with autism. Intraclass correlation coefficients for the total Milestones and Barrier scores were 0.876…

Descriptors: Barriers, Interrater Reliability, Autism, Educational Objectives

Reliability and Construct Validity of the TBI-QOL Communication Short Form as a Parent-Proxy Report Instrument for Children with Traumatic Brain Injury

Peer reviewed

Direct link

Cohen, Matthew L.; Tulsky, David S.; Boulton, Aaron J.; Kisala, Pamela A.; Bertisch, Hilary; Yeates, Keith Owen; Zonfrillo, Mark R.; Durbin, Dennis R.; Jaffe, Kenneth M.; Temkin, Nancy; Wang, Jin; Rivara, Frederick P. – Journal of Speech, Language, and Hearing Research, 2019

Purpose: The purpose of this study was to evaluate the internal consistency and construct validity of the Traumatic Brain Injury Quality of Life Communication Item Bank (TBI-QOL COM) short form as a parent-proxy report measure. The TBI-QOL COM is a patient-reported outcome measure of functional communication originally developed as a self-report…

Descriptors: Brain, Head Injuries, Quality of Life, Pediatrics

Linking the International English Language Competency Assessment Suite of Examinations to the Common European Framework of Reference

Peer reviewed

Direct link

Hidri, Sahbi – Language Testing in Asia, 2021

The study investigated the alignment process of the International English Language Competency Assessment (IELCA) suite examinations' four levels, B1, B2, C1 and C2, onto the Common European Framework of Reference (CEFR) by explaining and discussing the five linking stages (Council of Europe (CoE 2009). Unlike previous studies, this study used the…

Descriptors: Literacy, Second Language Learning, Second Language Instruction, English (Second Language)

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Applied Measurement in…	3
ETS Research Report Series	3
Educational and Psychological…	3
Journal of Speech, Language,…	3
Language Testing	3
Education and Information…	2
Journal of Autism and…	2
Online Submission	2
ProQuest LLC	2
Reading Psychology	2
Advances in Language and…	1
Advances in Physiology…	1
American Journal on Mental…	1
Applied Psychological…	1
CALICO Journal	1
Canadian Modern Language…	1
Cogent Education	1
Educational Psychology	1
Educational Research and…	1
Educational Sciences: Theory…	1
English Language Teaching	1
English Teaching	1
International Journal of…	1
International Journal of…	1
Journal of Attention Disorders	1
More ▼

Coniam, David	3
Strong, Michael	2
Wind, Stefanie A.	2
Abdul Gafoor, K.	1
Accomazzo, Sarah	1
Ahrari, Ramin	1
Allan S. Cohen	1
Alliger, George M.	1
Amanda Huee-Ping Wong	1
Apple, Kristen	1
Ari, Gokhan	1
Aryadoust, Vahid	1
Barth, Amy E.	1
Berger, Cynthia M.	1
Bertisch, Hilary	1
Beyreli, Latif	1
Blair, William O.	1
Boulton, Aaron J.	1
Boyd, Victoria	1
Breyer, F. Jay	1
Bölte, Sven	1
Canton, Ursula	1
Carifio, James	1
Chaudhary, Banshi D.	1
More ▼