ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	10

Descriptor

Reliability	14
Second Language Learning	14
Writing Tests	14
English (Second Language)	12
Scores	8
Language Tests	7
Foreign Countries	6
Scoring	6
Generalizability Theory	5
Validity	5
Writing Evaluation	5
Comparative Analysis	4
Essays	4
Correlation	3
Second Language Instruction	3
Writing Skills	3
Artificial Intelligence	2
Automation	2
Computer Assisted Testing	2
Computer Software	2
Evaluators	2
Feedback (Response)	2
Language Proficiency	2
Natural Language Processing	2
Speech	2
More ▼

Source

ETS Research Report Series	3
Applied Linguistics	1
Assessing Writing	1
Australian Journal of Teacher…	1
Exceptional Children	1
International Journal of…	1
Journal of Language and…	1
Language Teaching Research…	1
Language Testing	1
Reading Matrix: An…	1

Publication Type

Journal Articles	12
Reports - Research	12
Numerical/Quantitative Data	2
Speeches/Meeting Papers	2
Tests/Questionnaires	2
Information Analyses	1
Reports - Evaluative	1

Education Level

Secondary Education	3
High Schools	2
Higher Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 10	1
Grade 11	1
Grade 12	1
Grade 6	1
Grade 7	1
Grade 8	1
Grade 9	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
More ▼

Audience

Location

Canada	2
Australia	1
Colombia	1
Hong Kong	1
Mexico	1
Saudi Arabia	1
South Korea	1
Taiwan	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	5
Graduate Management Admission…	1
International English…	1
Test of English for…	1

What Works Clearinghouse Rating

Showing all 14 results Save | Export

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Automated L2 Writing Performance Assessment: A Literature Review

Peer reviewed

Direct link

Sari, Elif; Han, Turgay – Reading Matrix: An International Online Journal, 2021

Providing both effective feedback applications and reliable assessment practices are two central issues in ESL/EFL writing instruction contexts. Giving individual feedback is very difficult in crowded classes as it requires a great amount of time and effort for instructors. Moreover, instructors likely employ inconsistent assessment procedures,…

Descriptors: Automation, Writing Evaluation, Artificial Intelligence, Natural Language Processing

Evaluating the Stability of Test Score Means for the "TOEIC"® Speaking and Writing Tests. Research Report. ETS RR-17-50

Peer reviewed
PDF on ERIC

Download full text

Qu, Yanxuan; Huo, Yan; Chan, Eric; Shotts, Matthew – ETS Research Report Series, 2017

For educational tests, it is critical to maintain consistency of score scales and to understand the sources of variation in score means over time. This practice helps to ensure that interpretations about test takers' abilities are comparable from one administration (or one form) to another. This study examines the consistency of reported scores…

Descriptors: Scores, English (Second Language), Language Tests, Second Language Learning

Measuring the Impact of Rater Negotiation in Writing Performance Assessment

Peer reviewed

Direct link

Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017

Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…

Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators

Does Indirect Writing Assessment Have Any Relevance to Direct Writing Assessment? Focus on Validity and Reliability

Peer reviewed
PDF on ERIC

Download full text

Kural, Faruk – Journal of Language and Linguistic Studies, 2018

The present paper, which is a study based on midterm exam results of 53 University English prep-school students, examines correlation between a direct writing test, measured holistically by multiple-trait scoring, and two indirect writing tests used in a competence exam, one of which is a multiple-choice cloze test and the other a rewrite test…

Descriptors: Writing Evaluation, Cloze Procedure, Comparative Analysis, Essays

The Effect of Peer Reviewing on Writing Apprehension and Essay Writing Ability of Prospective EFL Teachers

Peer reviewed
PDF on ERIC

Download full text

Jahin, Jamal Hamed – Australian Journal of Teacher Education, 2012

This study aimed to ascertain the current level of writing apprehension experienced by Saudi prospective EFL teachers and their current level of essay writing ability. It also aimed to assess the impact of peer reviewing on their writing apprehension level and essay writing ability. Data collection was carried out via two instruments: Second…

Descriptors: Writing Tests, English (Second Language), Feedback (Response), Control Groups

Toward Automated Multi-Trait Scoring of Essays: Investigating Links among Holistic, Analytic, and Text Feature Scores

Peer reviewed

Direct link

Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – Applied Linguistics, 2010

The main purpose of the study was to investigate the distinctness and reliability of analytic (or multi-trait) rating dimensions and their relationships to holistic scores and "e-rater"[R] essay feature variables in the context of the TOEFL[R] computer-based test (TOEFL CBT) writing assessment. Data analyzed in the study were holistic…

Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays

Curriculum-Based Measurement in Writing: Predicting the Success of High-School Students on State Standards Tests

Peer reviewed

Direct link

Espin, Christine; Wallace, Teri; Campbell, Heather; Lembke, Erica S.; Long, Jeffrey D.; Ticha, Renata – Exceptional Children, 2008

We examined the technical adequacy of writing progress measures as indicators of success on state standards tests. Tenth-grade students wrote for 10 min, marking their samples at 3, 5, and 7 min. Samples were scored for words written, words spelled correctly, and correct and correct minus incorrect word sequences. The number of correct minus…

Descriptors: Curriculum Based Assessment, State Standards, Second Language Learning, Writing (Composition)

How Accurate Are ESL Students' Holistic Writing Scores on Large-Scale Assessments?--A Generalizability Theory Approach

Peer reviewed

Direct link

Huang, Jinyan – Assessing Writing, 2008

Using generalizability theory, this study examined both the rating variability and reliability of ESL students' writing in the provincial English examinations in Canada. Three years' data were used in order to complete the analyses and examine the stability of the results. The major research question that guided this study was: Are there any…

Descriptors: Generalizability Theory, Foreign Countries, English (Second Language), Writing Tests

Evaluating Prototype Tasks and Alternative Rating Schemes for a New ESL Writing Test through G-Theory

Peer reviewed

Direct link

Lee, Yong-Won; Kantor, Robert – International Journal of Testing, 2007

Possible integrated and independent tasks were pilot tested for the writing section of a new generation of the TOEFL[R] (Test of English as a Foreign Language[TM]). This study examines the impact of various rating designs and of the number of tasks and raters on the reliability of writing scores based on integrated and independent tasks from the…

Descriptors: Generalizability Theory, Writing Tests, English (Second Language), Second Language Learning

Score Reliability as an Essential Prerequisite for Validating New Writing and Speaking Tasks for TOEFL.

Lee, Yong-Won; Kantor, Robert; Mollaun, Pam – 2002

This paper reports the results of generalizability theory (G) analyses done for new writing and speaking tasks for the Test of English as a Foreign Language (TOEFL). For writing, a special focus was placed on evaluating the impact on the reliability of the number of raters (or ratings) per essay (one or two) and the number of tasks (one, two, or…

Descriptors: English (Second Language), Generalizability Theory, Reliability, Scores

Dependability of New ESL Writing Test Scores: Evaluating Prototype Tasks and Alternative Rating Schemes. TOEFL® Monograph Series. MS-31. ETS RR-05-14

Peer reviewed
PDF on ERIC

Download full text

Lee, Yong-Won; Kantor, Robert – ETS Research Report Series, 2005

Possible integrated and independent tasks were pilot tested for the writing section of a new generation of TOEFL® (Test of English as a Foreign Language™) examination. This study examines the impact of various rating designs as well as the impact of the number of tasks and raters on the reliability of writing scores based on integrated and…

Descriptors: Language Tests, English (Second Language), Second Language Learning, Writing Tests

Score Dependability of the Writing and Speaking Sections of New TOEFL.

Lee, Yong-Won; Kantor, Robert; Mollaun, Pam – 2002

This study examines the score dependability of writing and speaking assessments from the Test of English as a Foreign Language (TOEFL) from the perspectives of univariate and multivariate generalizability theory (G-theory) and presents the findings of three separate G-theory studies. For writing, the focus was on evaluating the impact on…

Descriptors: Ability, English (Second Language), Generalizability Theory, Item Bias

Automated Essay Scoring with e-rater® v.2.0. Research Report. ETS RR-04-45

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Burstein, Jill – ETS Research Report Series, 2005

The e-rater® system has been used by ETS for automated essay scoring since 1999. This paper describes a new version of e-rater (v.2.0) that differs from the previous one (v.1.3) with regard to the feature set and model building approach. The paper describes the new version, compares the new and previous versions in terms of performance, and…

Descriptors: Essay Tests, Automation, Scoring, Comparative Analysis

Kantor, Robert	5
Lee, Yong-Won	5
Mollaun, Pam	2
Attali, Yigal	1
Burstein, Jill	1
Campbell, Heather	1
Chan, Eric	1
Espin, Christine	1
Gentile, Claudia	1
Han, Turgay	1
Huang, Jinyan	1
Huo, Yan	1
Jahin, Jamal Hamed	1
Janssen, Gerriet	1
Kural, Faruk	1
Lembke, Erica S.	1
Long, Jeffrey D.	1
Meier, Valerie	1
Osama Koraishi	1
Qu, Yanxuan	1
Sari, Elif	1
Shotts, Matthew	1
Ticha, Renata	1
Trace, Jonathan	1
Wallace, Teri	1
More ▼