Publication Date
| In 2026 | 0 |
| Since 2025 | 5 |
| Since 2022 (last 5 years) | 24 |
| Since 2017 (last 10 years) | 70 |
| Since 2007 (last 20 years) | 118 |
Descriptor
| English (Second Language) | 139 |
| Interrater Reliability | 139 |
| Second Language Learning | 139 |
| Foreign Countries | 87 |
| Second Language Instruction | 63 |
| Language Tests | 60 |
| Evaluators | 42 |
| Scores | 41 |
| Language Proficiency | 40 |
| Writing Evaluation | 36 |
| Oral Language | 35 |
| More ▼ | |
Source
Author
| Coniam, David | 3 |
| Ahmadi, Alireza | 2 |
| Aydin, Selami | 2 |
| Davis, Larry | 2 |
| Gersten, Russell | 2 |
| McNamara, T. F. | 2 |
| Adams, R. J. | 1 |
| Afzali, Katayoon | 1 |
| Ahmadi Shirazi, Masoumeh | 1 |
| Ahmed Alkhateeb | 1 |
| Ahour, Touran | 1 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 2 |
| Researchers | 1 |
| Teachers | 1 |
Location
| Iran | 12 |
| China | 11 |
| Japan | 10 |
| Turkey | 10 |
| Hong Kong | 5 |
| Saudi Arabia | 4 |
| South Korea | 4 |
| Canada | 3 |
| Europe | 3 |
| Germany | 3 |
| India | 3 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 1 |
| Meets WWC Standards with or without Reservations | 1 |
Erik Voss – Language Testing, 2025
An increasing number of language testing companies are developing and deploying deep learning-based automated essay scoring systems (AES) to replace traditional approaches that rely on handcrafted feature extraction. However, there is hesitation to accept neural network approaches to automated essay scoring because the features are automatically…
Descriptors: Artificial Intelligence, Automation, Scoring, English (Second Language)
Somayeh Fathali; Fatemeh Mohajeri – Technology in Language Teaching & Learning, 2025
The International English Language Testing System (IELTS) is a high-stakes exam where Writing Task 2 significantly influences the overall scores, requiring reliable evaluation. While trained human raters perform this task, concerns about subjectivity and inconsistency have led to growing interest in artificial intelligence (AI)-based assessment…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Artificial Intelligence
Junfei Li; Jinyan Huang; Thomas Sheeran – SAGE Open, 2025
This study investigated the role of ChatGPT4o as an AI peer assessor in English-as-a-foreign-language (EFL) speaking classrooms, with a focus on its scoring reliability and the effectiveness of its feedback. The research involved 40 first-year English major students from two parallel classes at a Chinese university. Twenty from one class served as…
Descriptors: Artificial Intelligence, Technology Uses in Education, Peer Evaluation, English (Second Language)
On-Soon Lee – Journal of Pan-Pacific Association of Applied Linguistics, 2024
Despite the increasing interest in using AI tools as assistant agents in instructional settings, the effectiveness of ChatGPT, the generative pretrained AI, for evaluating the accuracy of second language (L2) writing has been largely unexplored in formative assessment. Therefore, the current study aims to examine how ChatGPT, as an evaluator,…
Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning
Seedhouse, Paul; Satar, Müge – Classroom Discourse, 2023
The same L2 speaking performance may be analysed and evaluated in very different ways by different teachers or raters. We present a new, technology-assisted research design which opens up to investigation the trajectories of convergence and divergence between raters. We tracked and recorded what different raters noticed when, whilst grading a…
Descriptors: Language Tests, English (Second Language), Second Language Learning, Oral Language
Sasithorn Limgomolvilas; Patsawut Sukserm – LEARN Journal: Language Education and Acquisition Research Network, 2025
The assessment of English speaking in EFL environments can be inherently subjective and influenced by various factors beyond linguistic ability, including choice of assessment criteria, and even the rubric type. In classroom assessment, the type of rubric recommended for English speaking tasks is the analytical rubric. Driven by three aims, this…
Descriptors: Oral Language, Speech Communication, English (Second Language), Second Language Learning
Tanaka, Mitsuko; Ross, Steven J. – Assessment in Education: Principles, Policy & Practice, 2023
Raters vary from each other in their severity and leniency in rating performance. This study examined the factors affecting rater severity in peer assessments of oral presentations in English as a Foreign Language (EFL), focusing on peer raters' self-construal and presentation abilities. Japanese university students enrolled in EFL classes…
Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Peer Evaluation
Yesilçinar, Sabahattin; Sata, Mehmet – International Journal of Psychology and Educational Studies, 2021
The current study employed many-facet Rasch measurement (MFRM) to explain the rater bias patterns of EFL student teachers (hereafter students) when they rate the teaching performance of their peers in three assessment environments: online, face-to-face, and anonymous. Twenty-four students and two instructors rated 72 micro-teachings performed by…
Descriptors: Peer Evaluation, Preservice Teachers, English (Second Language), Second Language Learning
Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023
This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…
Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Shabani, Enayat A.; Panahi, Jaleh – Language Testing in Asia, 2020
The literature on using scoring rubrics in writing assessment denotes the significance of rubrics as practical and useful means to assess the quality of writing tasks. This study tries to investigate the agreement among rubrics endorsed and used for assessing the essay writing tasks by the internationally recognized tests of English language…
Descriptors: Writing Evaluation, Scoring Rubrics, Scores, Interrater Reliability
Han, Chao; Zhao, Xiao – Assessment & Evaluation in Higher Education, 2021
The accuracy of peer ratings on students' performance has attracted much attention from higher education researchers. In this study, we attempted to explore the accuracy of peer ratings on the quality of spoken-language interpreting in the context of tertiary-level interpreter training. We sought to understand how different types of peer raters…
Descriptors: Accuracy, Peer Evaluation, Oral Language, Interpretive Skills
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
Hassan Saleh Mahdi; Ahmed Alkhateeb – International Journal of Computer-Assisted Language Learning and Teaching, 2025
This study aims to develop a robust rubric for evaluating artificial intelligence (AI)--assisted essay writing in English as a Foreign Language (EFL) contexts. Employing a modified Delphi technique, we conducted a comprehensive literature review and administered Likert scale questionnaires. This process yielded nine key evaluation criteria,…
Descriptors: Scoring Rubrics, Essays, Writing Evaluation, Artificial Intelligence
Polat, Murat – International Online Journal of Education and Teaching, 2020
The assessment of speaking skills in foreign language testing has always had some pros (testing learners' speaking skills doubles the validity of any language test) and cons (many testrelevant/irrelevant variables interfere) since it is a multi-dimensional process. In the meantime, exploring grader behaviours while scoring learners' speaking…
Descriptors: Item Response Theory, Interrater Reliability, Speech Skills, Second Language Learning

Peer reviewed
Direct link
