NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 6 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Erik Voss – Language Testing, 2025
An increasing number of language testing companies are developing and deploying deep learning-based automated essay scoring systems (AES) to replace traditional approaches that rely on handcrafted feature extraction. However, there is hesitation to accept neural network approaches to automated essay scoring because the features are automatically…
Descriptors: Artificial Intelligence, Automation, Scoring, English (Second Language)
Peer reviewed Peer reviewed
Direct linkDirect link
Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024
This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…
Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Yasuyo Sawaki; Yutaka Ishii; Hiroaki Yamada; Takenobu Tokunaga – Language Testing, 2025
This study examined the consistency between instructor ratings of learner-generated summaries and those estimated by a large language model (LLM) on summary content checklist items designed for undergraduate second language (L2) writing instruction in Japan. The effects of the LLM prompt design on the consistency between the two were also explored…
Descriptors: Interrater Reliability, Writing Teachers, College Faculty, Artificial Intelligence
Peer reviewed Peer reviewed
Direct linkDirect link
Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024
This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…
Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience
Peer reviewed Peer reviewed
Direct linkDirect link
Ping-Lin Chuang – Language Testing, 2025
This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…
Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources
Peer reviewed Peer reviewed
Direct linkDirect link
Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023
This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…
Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification