ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	6

Descriptor

English (Second Language)	8
Language Tests	8
Reliability	8
Writing Tests	8
Second Language Learning	7
Writing Evaluation	5
Foreign Countries	4
Scores	4
Scoring	4
Comparative Analysis	3
Correlation	3
Essays	3
Generalizability Theory	3
Computer Software	2
Grading	2
Secondary School Students	2
Validity	2
Writing Skills	2
Artificial Intelligence	1
Automation	1
Business Administration…	1
Business Schools	1
Chinese	1
Cloze Procedure	1
College Entrance Examinations	1
More ▼

Source

ETS Research Report Series	3
Applied Linguistics	1
International Journal of…	1
Journal of Language and…	1
Language Teaching Research…	1
ProQuest LLC	1

Author

Kantor, Robert	3
Lee, Yong-Won	3
Attali, Yigal	1
Burstein, Jill	1
Chan, Eric	1
Gentile, Claudia	1
Huo, Yan	1
Kural, Faruk	1
Lam, Ling Chi Tenny	1
Osama Koraishi	1
Qu, Yanxuan	1
Shotts, Matthew	1
More ▼

Publication Type

Journal Articles	7
Reports - Research	6
Tests/Questionnaires	2
Dissertations/Theses -…	1
Reports - Evaluative	1

Education Level

Secondary Education	3
Elementary Education	2
Elementary Secondary Education	1
Grade 10	1
Grade 11	1
Grade 12	1
Grade 6	1
Grade 7	1
Grade 8	1
Grade 9	1
High Schools	1
Higher Education	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
More ▼

Audience

Location

Hong Kong	2
Australia	1
Canada	1
Mexico	1
South Korea	1
Taiwan	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	3
Graduate Management Admission…	1
International English…	1
Test of English for…	1

What Works Clearinghouse Rating

Showing all 8 results Save | Export

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Evaluating the Stability of Test Score Means for the "TOEIC"® Speaking and Writing Tests. Research Report. ETS RR-17-50

Peer reviewed
PDF on ERIC

Download full text

Qu, Yanxuan; Huo, Yan; Chan, Eric; Shotts, Matthew – ETS Research Report Series, 2017

For educational tests, it is critical to maintain consistency of score scales and to understand the sources of variation in score means over time. This practice helps to ensure that interpretations about test takers' abilities are comparable from one administration (or one form) to another. This study examines the consistency of reported scores…

Descriptors: Scores, English (Second Language), Language Tests, Second Language Learning

Does Indirect Writing Assessment Have Any Relevance to Direct Writing Assessment? Focus on Validity and Reliability

Peer reviewed
PDF on ERIC

Download full text

Kural, Faruk – Journal of Language and Linguistic Studies, 2018

The present paper, which is a study based on midterm exam results of 53 University English prep-school students, examines correlation between a direct writing test, measured holistically by multiple-trait scoring, and two indirect writing tests used in a competence exam, one of which is a multiple-choice cloze test and the other a rewrite test…

Descriptors: Writing Evaluation, Cloze Procedure, Comparative Analysis, Essays

Toward Automated Multi-Trait Scoring of Essays: Investigating Links among Holistic, Analytic, and Text Feature Scores

Peer reviewed

Direct link

Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – Applied Linguistics, 2010

The main purpose of the study was to investigate the distinctness and reliability of analytic (or multi-trait) rating dimensions and their relationships to holistic scores and "e-rater"[R] essay feature variables in the context of the TOEFL[R] computer-based test (TOEFL CBT) writing assessment. Data analyzed in the study were holistic…

Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays

An Application of Generalizability Theory on Writing Assessment: Effects of Marking Components Weighting

Direct link

Lam, Ling Chi Tenny – ProQuest LLC, 2010

In writing assessment, there are quite a number of factors influencing the marking stability and the reliability of the assessment such as the attitude towards marking and consistency of markers, the physical environment, the design of the items, and marking rubrics. Even the methods to train markers have effects on the reliability of the…

Descriptors: Foreign Countries, Grading, Scoring Rubrics, Educational Assessment

Evaluating Prototype Tasks and Alternative Rating Schemes for a New ESL Writing Test through G-Theory

Peer reviewed

Direct link

Lee, Yong-Won; Kantor, Robert – International Journal of Testing, 2007

Possible integrated and independent tasks were pilot tested for the writing section of a new generation of the TOEFL[R] (Test of English as a Foreign Language[TM]). This study examines the impact of various rating designs and of the number of tasks and raters on the reliability of writing scores based on integrated and independent tasks from the…

Descriptors: Generalizability Theory, Writing Tests, English (Second Language), Second Language Learning

Dependability of New ESL Writing Test Scores: Evaluating Prototype Tasks and Alternative Rating Schemes. TOEFL® Monograph Series. MS-31. ETS RR-05-14

Peer reviewed
PDF on ERIC

Download full text

Lee, Yong-Won; Kantor, Robert – ETS Research Report Series, 2005

Possible integrated and independent tasks were pilot tested for the writing section of a new generation of TOEFL® (Test of English as a Foreign Language™) examination. This study examines the impact of various rating designs as well as the impact of the number of tasks and raters on the reliability of writing scores based on integrated and…

Descriptors: Language Tests, English (Second Language), Second Language Learning, Writing Tests

Automated Essay Scoring with e-rater® v.2.0. Research Report. ETS RR-04-45

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Burstein, Jill – ETS Research Report Series, 2005

The e-rater® system has been used by ETS for automated essay scoring since 1999. This paper describes a new version of e-rater (v.2.0) that differs from the previous one (v.1.3) with regard to the feature set and model building approach. The paper describes the new version, compares the new and previous versions in terms of performance, and…

Descriptors: Essay Tests, Automation, Scoring, Comparative Analysis