ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	10
Since 2017 (last 10 years)	27
Since 2007 (last 20 years)	37

Descriptor

Interrater Reliability	42
Second Language Learning	42
Writing Evaluation	42
English (Second Language)	36
Foreign Countries	22
Second Language Instruction	20
Essays	17
Scores	14
Evaluators	13
Language Tests	12
Comparative Analysis	11
Undergraduate Students	11
Correlation	10
Scoring	9
Writing Instruction	9
Writing Skills	9
Writing Tests	9
Computer Assisted Testing	8
Language Proficiency	8
Rating Scales	8
Statistical Analysis	8
Evaluation Criteria	7
Scoring Rubrics	7
Accuracy	6
College Students	6
More ▼

Publication Type

Journal Articles	38
Reports - Research	33
Tests/Questionnaires	12
Reports - Evaluative	5
Books	1
Collected Works - General	1
Dissertations/Theses -…	1
Information Analyses	1
Reports - Descriptive	1

Education Level

Higher Education	20
Postsecondary Education	11
Secondary Education	3
Grade 11	1
High Schools	1

Audience

Practitioners	1
Researchers	1
Teachers	1

Location

Iran	4
China	3
Japan	3
Taiwan	2
Europe	1
Germany	1
Hong Kong	1
Ohio	1
Philippines	1
Saudi Arabia	1
South Africa	1
South Korea	1
Spain	1
Switzerland	1
Tunisia	1
Turkey	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	4
International English…	3
Graduate Record Examinations	1

What Works Clearinghouse Rating

Showing 1 to 15 of 42 results Save | Export

Examining AI-Based Accuracy Assessment in L2 Learners' Writing

Peer reviewed

Direct link

On-Soon Lee – Journal of Pan-Pacific Association of Applied Linguistics, 2024

Despite the increasing interest in using AI tools as assistant agents in instructional settings, the effectiveness of ChatGPT, the generative pretrained AI, for evaluating the accuracy of second language (L2) writing has been largely unexplored in formative assessment. Therefore, the current study aims to examine how ChatGPT, as an evaluator,…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Using Inter-Rater Discourse to Trace the Origins of Disagreement: Towards Collective Reflective Practice in L2 Assessment

Peer reviewed

Direct link

Matthews, Joshua – RELC Journal: A Journal of Language Teaching and Research, 2023

This article explores how the analysis of inter-rater discourse can be used to support collective reflective practice in second language (L2) assessment. To demonstrate, a focused case of the discourse between two experienced language teachers as they negotiate assessment decisions on L2 written texts is presented. Of particular interest was the…

Descriptors: Interrater Reliability, Discourse Analysis, Student Evaluation, Second Language Learning

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Examining Consistency among Different Rubrics for Assessing Writing

Peer reviewed

Direct link

Shabani, Enayat A.; Panahi, Jaleh – Language Testing in Asia, 2020

The literature on using scoring rubrics in writing assessment denotes the significance of rubrics as practical and useful means to assess the quality of writing tasks. This study tries to investigate the agreement among rubrics endorsed and used for assessing the essay writing tasks by the internationally recognized tests of English language…

Descriptors: Writing Evaluation, Scoring Rubrics, Scores, Interrater Reliability

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

Revolutionising Essay Evaluation: A Cutting-Edge Rubric for AI-Assisted Writing

Peer reviewed

Direct link

Hassan Saleh Mahdi; Ahmed Alkhateeb – International Journal of Computer-Assisted Language Learning and Teaching, 2025

This study aims to develop a robust rubric for evaluating artificial intelligence (AI)--assisted essay writing in English as a Foreign Language (EFL) contexts. Employing a modified Delphi technique, we conducted a comprehensive literature review and administered Likert scale questionnaires. This process yielded nine key evaluation criteria,…

Descriptors: Scoring Rubrics, Essays, Writing Evaluation, Artificial Intelligence

Scoring Rubric Reliability and Internal Validity in Rater-Mediated EFL Writing Assessment: Insights from Many-Facet Rasch Measurement

Peer reviewed

Direct link

Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022

Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…

Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods

Impacts of ChatGPT-Assisted Writing for EFL English Majors: Feasibility and Challenges

Peer reviewed

Direct link

Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024

To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…

Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)

The Rater Performance Categorization System (RPCS) for Intensive English Programs

Peer reviewed
PDF on ERIC

Download full text

Sahin, Alper – Shanlax International Journal of Education, 2021

There are several student performances assessed in Intensive English Programs (IEPs) worldwide in each academic year. These student performances are mostly graded by human raters with a certain degree of error. However, the accuracy of these performance assessments is of utmost importance because they feed data into some high stakes decisions…

Descriptors: Intensive Language Courses, Second Language Instruction, Second Language Learning, English (Second Language)

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

Comparative Judgement: Assess Student Production without Absolute Judgements

Peer reviewed
PDF on ERIC

Download full text

Sumner, Josh – Research-publishing.net, 2021

Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…

Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making

Investigation of Interrater Reliability in the Evaluation of Foreign Language Writing Skills with Multigroup Confirmatory Factor Analysis

Peer reviewed
PDF on ERIC

Download full text

Önen, Emine; Yayvak, Melike Kübra Tasdelen – Journal of Education and Training Studies, 2019

In this study, it was aimed to examine the interrater reliability of the scoring of paragraph writing skills on foreign languages with the measurement invariance tests. The study group consists of 267 students studying English at the Preparatory School at Gazi University. In the study, where students write a paragraph on the same topic, the…

Descriptors: Second Language Learning, Second Language Instruction, Factor Analysis, English (Second Language)

The Use of Semantic Similarity Tools in Automated Content Scoring of Fact-Based Essays Written by EFL Learners

Peer reviewed

Direct link

Wang, Qiao – Education and Information Technologies, 2022

This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring

The Impact of Computers on Marking Behaviors and Assessment: A Many-Facet Rasch Measurement Analysis of Essays by EFL College Students

Peer reviewed

Direct link

He, Tung-hsien – SAGE Open, 2019

This study employed a mixed-design approach and the Many-Facet Rasch Measurement (MFRM) framework to investigate whether rater bias occurred between the onscreen scoring (OSS) mode and the paper-based scoring (PBS) mode. Nine human raters analytically marked scanned scripts and paper scripts using a six-category (i.e., six-criterion) rating…

Descriptors: Computer Assisted Testing, Scoring, Item Response Theory, Essays

Previous Page | Next Page »

Pages: 1 | 2 | 3

Language Testing	6
Assessing Writing	3
Education and Information…	2
Language Testing in Asia	2
SAGE Open	2
ETS Research Report Series	1
Educational Research and…	1
English Teaching	1
IEEE Transactions on Learning…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Iranian Journal of Language…	1
Journal of Education and…	1
Journal of Pan-Pacific…	1
Journal of Second Language…	1
Language Assessment Quarterly	1
Online Submission	1
ProQuest LLC	1
RELC Journal: A Journal of…	1
Reading & Writing Quarterly	1
Reading Matrix: An…	1
Reading and Writing: An…	1
Research-publishing.net	1
More ▼

Ahmadi Shirazi, Masoumeh	1
Ahmed Alkhateeb	1
Barkaoui, Khaled	1
Barkhuizen, Gary	1
Carlson, Sybil B.	1
Casabianca, Jodi M.	1
Chuang, Ping-Lin	1
Chung-You Tsai	1
Cimasko, Tony	1
Coniam, David	1
Derakhshan, Ali	1
Elder, Catherine	1
Erdosy, M. Usman	1
Foote, Chandra J.	1
Gamaroff, Raphael	1
Gebril, Atta	1
Gustilo, Leah E.	1
Hassan Saleh Mahdi	1
He, Tung-hsien	1
Hidalgo, María Ángeles	1
Huang, Jinyan	1
Huang, Ting	1
Iain Kelsall Brown	1
Jarvis, Scott	1
More ▼