Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 3 |
| Since 2007 (last 20 years) | 6 |
Descriptor
| English (Second Language) | 8 |
| Language Tests | 8 |
| Reliability | 8 |
| Writing Tests | 8 |
| Second Language Learning | 7 |
| Writing Evaluation | 5 |
| Foreign Countries | 4 |
| Scores | 4 |
| Scoring | 4 |
| Comparative Analysis | 3 |
| Correlation | 3 |
| More ▼ | |
Source
| ETS Research Report Series | 3 |
| Applied Linguistics | 1 |
| International Journal of… | 1 |
| Journal of Language and… | 1 |
| Language Teaching Research… | 1 |
| ProQuest LLC | 1 |
Author
| Kantor, Robert | 3 |
| Lee, Yong-Won | 3 |
| Attali, Yigal | 1 |
| Burstein, Jill | 1 |
| Chan, Eric | 1 |
| Gentile, Claudia | 1 |
| Huo, Yan | 1 |
| Kural, Faruk | 1 |
| Lam, Ling Chi Tenny | 1 |
| Osama Koraishi | 1 |
| Qu, Yanxuan | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 7 |
| Reports - Research | 6 |
| Tests/Questionnaires | 2 |
| Dissertations/Theses -… | 1 |
| Reports - Evaluative | 1 |
Education Level
| Secondary Education | 3 |
| Elementary Education | 2 |
| Elementary Secondary Education | 1 |
| Grade 10 | 1 |
| Grade 11 | 1 |
| Grade 12 | 1 |
| Grade 6 | 1 |
| Grade 7 | 1 |
| Grade 8 | 1 |
| Grade 9 | 1 |
| High Schools | 1 |
| More ▼ | |
Audience
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 3 |
| Graduate Management Admission… | 1 |
| International English… | 1 |
| Test of English for… | 1 |
What Works Clearinghouse Rating
Osama Koraishi – Language Teaching Research Quarterly, 2024
This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence
Qu, Yanxuan; Huo, Yan; Chan, Eric; Shotts, Matthew – ETS Research Report Series, 2017
For educational tests, it is critical to maintain consistency of score scales and to understand the sources of variation in score means over time. This practice helps to ensure that interpretations about test takers' abilities are comparable from one administration (or one form) to another. This study examines the consistency of reported scores…
Descriptors: Scores, English (Second Language), Language Tests, Second Language Learning
Kural, Faruk – Journal of Language and Linguistic Studies, 2018
The present paper, which is a study based on midterm exam results of 53 University English prep-school students, examines correlation between a direct writing test, measured holistically by multiple-trait scoring, and two indirect writing tests used in a competence exam, one of which is a multiple-choice cloze test and the other a rewrite test…
Descriptors: Writing Evaluation, Cloze Procedure, Comparative Analysis, Essays
Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – Applied Linguistics, 2010
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multi-trait) rating dimensions and their relationships to holistic scores and "e-rater"[R] essay feature variables in the context of the TOEFL[R] computer-based test (TOEFL CBT) writing assessment. Data analyzed in the study were holistic…
Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays
Lam, Ling Chi Tenny – ProQuest LLC, 2010
In writing assessment, there are quite a number of factors influencing the marking stability and the reliability of the assessment such as the attitude towards marking and consistency of markers, the physical environment, the design of the items, and marking rubrics. Even the methods to train markers have effects on the reliability of the…
Descriptors: Foreign Countries, Grading, Scoring Rubrics, Educational Assessment
Lee, Yong-Won; Kantor, Robert – International Journal of Testing, 2007
Possible integrated and independent tasks were pilot tested for the writing section of a new generation of the TOEFL[R] (Test of English as a Foreign Language[TM]). This study examines the impact of various rating designs and of the number of tasks and raters on the reliability of writing scores based on integrated and independent tasks from the…
Descriptors: Generalizability Theory, Writing Tests, English (Second Language), Second Language Learning
Lee, Yong-Won; Kantor, Robert – ETS Research Report Series, 2005
Possible integrated and independent tasks were pilot tested for the writing section of a new generation of TOEFL® (Test of English as a Foreign Language™) examination. This study examines the impact of various rating designs as well as the impact of the number of tasks and raters on the reliability of writing scores based on integrated and…
Descriptors: Language Tests, English (Second Language), Second Language Learning, Writing Tests
Attali, Yigal; Burstein, Jill – ETS Research Report Series, 2005
The e-rater® system has been used by ETS for automated essay scoring since 1999. This paper describes a new version of e-rater (v.2.0) that differs from the previous one (v.1.3) with regard to the feature set and model building approach. The paper describes the new version, compares the new and previous versions in terms of performance, and…
Descriptors: Essay Tests, Automation, Scoring, Comparative Analysis

Peer reviewed
Direct link
