ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	3
Since 2017 (last 10 years)	6
Since 2007 (last 20 years)	15

Descriptor

Interrater Reliability	18
Second Language Learning	18
Writing Tests	18
English (Second Language)	13
Foreign Countries	9
Writing Evaluation	9
Language Tests	8
Comparative Analysis	6
Evaluators	6
College Students	5
Computer Assisted Testing	5
Essays	5
Rating Scales	5
Scoring	5
Scores	4
Second Language Instruction	4
Test Reliability	4
Computer Software	3
Evaluation Methods	3
Academic Discourse	2
Computer Assisted Instruction	2
Correlation	2
Decision Making	2
Educational Technology	2
Essay Tests	2
More ▼

Source

Language Testing	5
Assessing Writing	3
ETS Research Report Series	1
JALT CALL Journal	1
Online Submission	1
ProQuest LLC	1
Psicologica: International…	1
ReCALL	1
Research-publishing.net	1
System: An International…	1
Technology in Language…	1
Turkish Online Journal of…	1
More ▼

Publication Type

Journal Articles	16
Reports - Research	11
Reports - Evaluative	5
Tests/Questionnaires	2
Dissertations/Theses -…	1
Reports - Descriptive	1

Education Level

Higher Education	7
Postsecondary Education	4
Secondary Education	3
Elementary Secondary Education	1

Audience

Location

Turkey	2
Europe	1
Hong Kong	1
Illinois (Urbana)	1
Iran	1
Iran (Tehran)	1
Japan	1
Tunisia	1

Laws, Policies, & Programs

Assessments and Surveys

International English…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Artificial Intelligence in International English Language Testing System Writing Assessments: A Comparative Study of Human Ratings and DeepAI

Peer reviewed
PDF on ERIC

Download full text

Somayeh Fathali; Fatemeh Mohajeri – Technology in Language Teaching & Learning, 2025

The International English Language Testing System (IELTS) is a high-stakes exam where Writing Task 2 significantly influences the overall scores, requiring reliable evaluation. While trained human raters perform this task, concerns about subjectivity and inconsistency have led to growing interest in artificial intelligence (AI)-based assessment…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Artificial Intelligence

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

Comparative Judgement: Assess Student Production without Absolute Judgements

Peer reviewed
PDF on ERIC

Download full text

Sumner, Josh – Research-publishing.net, 2021

Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…

Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making

A Comparative Analysis of Face to Face Instruction vs. Telegram Mobile Instruction in Terms of Narrative Writing

Peer reviewed
PDF on ERIC

Download full text

Heidari, Jamshid; Khodabandeh, Farzaneh; Soleimani, Hassan – JALT CALL Journal, 2018

The emergence of computer technology in English language teaching has paved the way for teachers' application of Mobile Assisted Language Learning (mall) and its advantages in teaching. This study aimed to compare the effectiveness of the face to face instruction with Telegram mobile instruction. Based on a toefl test, 60 English foreign language…

Descriptors: Comparative Analysis, Conventional Instruction, Teaching Methods, Computer Assisted Instruction

Analysis of Rater Severity on Written Expression Exam Using Many Faceted Rasch Measurement

Peer reviewed
PDF on ERIC

Download full text

Prieto, Gerardo; Nieto, Eloísa – Psicologica: International Journal of Methodology and Experimental Psychology, 2014

This paper describes how a Many Faceted Rasch Measurement (MFRM) approach can be applied to performance assessment focusing on rater analysis. The article provides an introduction to MFRM, a description of MFRM analysis procedures, and an example to illustrate how to examine the effects of various sources of variability on test takers' performance…

Descriptors: Item Response Theory, Interrater Reliability, Rating Scales, Error of Measurement

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Bringing Reading-to-Write and Writing-Only Assessment Tasks Together: A Generalizability Analysis

Peer reviewed

Direct link

Gebril, Atta – Assessing Writing, 2010

Integrated tasks are currently employed in a number of L2 exams since they are perceived as an addition to the writing-only task type. Given this trend, the current study investigates composite score generalizability of both reading-to-write and writing-only tasks. For this purpose, a multivariate generalizability analysis is used to investigate…

Descriptors: Scoring, Scores, Second Language Instruction, Writing Evaluation

Experimenting with a Computer Essay-Scoring Program Based on ESL Student Writing Scripts

Peer reviewed

Direct link

Coniam, David – ReCALL, 2009

This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…

Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability

Validity and Fairness Implications of Varying Time Conditions on a Diagnostic Test of Academic English Writing Proficiency

Peer reviewed

Direct link

Knoch, Ute; Elder, Catherine – System: An International Journal of Educational Technology and Applied Linguistics, 2010

A number of scholars have questioned the practice of assessing academic writing in the context of a one-off language test, claiming that the time restrictions imposed in the test environment, when compared to the writing conditions typical at university, may prevent learners from displaying the kinds of writing skills required in academic…

Descriptors: Writing Tests, Language Tests, Test Validity, Interrater Reliability

Rater Bias Patterns in an EFL Writing Assessment

Peer reviewed

Direct link

Schaefer, Edward – Language Testing, 2008

The present study employed multi-faceted Rasch measurement (MFRM) to explore the rater bias patterns of native English-speaker (NES) raters when they rate EFL essays. Forty NES raters rated 40 essays written by female Japanese university students on a single topic adapted from the TOEFL Test of Written English (TWE). The essays were assessed using…

Descriptors: Writing Evaluation, Writing Tests, Program Effectiveness, Essays

Prompt and Rater Effects in Second Language Writing Performance Assessment

Direct link

Lim, Gad S. – ProQuest LLC, 2009

Performance assessments have become the norm for evaluating language learners' writing abilities in international examinations of English proficiency. Two aspects of these assessments are usually systematically varied: test takers respond to different prompts, and their responses are read by different raters. This raises the possibility of undue…

Descriptors: Performance Based Assessment, Language Tests, Performance Tests, Test Validity

Rating Scale Impact on EFL Essay Marking: A Mixed-Method Study

Peer reviewed

Direct link

Barkaoui, Khaled – Assessing Writing, 2007

Educators often have to choose among different types of rating scales to assess second-language (L2) writing performance. There is little research, however, on how different rating scales affect rater performance. This study employed a mixed-method approach to investigate the effects of two different rating scales on EFL essay scores, rating…

Descriptors: Writing Evaluation, Writing Tests, Rating Scales, Essays

Evaluating Rater Responses to an Online Training Program for L2 Writing Assessment

Peer reviewed

Direct link

Elder, Catherine; Barkhuizen, Gary; Knoch, Ute; von Randow, Janet – Language Testing, 2007

The use of online rater self-training is growing in popularity and has obvious practical benefits, facilitating access to training materials and rating samples and allowing raters to reorient themselves to the rating scale and self monitor their behaviour at their own convenience. However there has thus far been little research into rater…

Descriptors: Writing Evaluation, Writing Tests, Scoring Rubrics, Rating Scales

Previous Page | Next Page »

Pages: 1 | 2

Aydin, Selami	2
Elder, Catherine	2
Knoch, Ute	2
Barkaoui, Khaled	1
Barkhuizen, Gary	1
Breyer, F. Jay	1
Chuang, Ping-Lin	1
Coniam, David	1
Fatemeh Mohajeri	1
Gebril, Atta	1
Heidari, Jamshid	1
Khodabandeh, Farzaneh	1
Kyriakou, Nansia	1
Lamprianou, Iasonas	1
Lee, H. K.	1
Lim, Gad S.	1
Lorenz, Florian	1
Nieto, Eloísa	1
Ping-Lin Chuang	1
Prieto, Gerardo	1
Schaefer, Edward	1
Soleimani, Hassan	1
Somayeh Fathali	1
Sumner, Josh	1
Tsagari, Dina	1
More ▼