Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 5 |
| Since 2007 (last 20 years) | 10 |
Descriptor
| Reliability | 14 |
| Second Language Learning | 14 |
| Writing Tests | 14 |
| English (Second Language) | 12 |
| Scores | 8 |
| Language Tests | 7 |
| Foreign Countries | 6 |
| Scoring | 6 |
| Generalizability Theory | 5 |
| Validity | 5 |
| Writing Evaluation | 5 |
| More ▼ | |
Source
Author
| Kantor, Robert | 5 |
| Lee, Yong-Won | 5 |
| Mollaun, Pam | 2 |
| Attali, Yigal | 1 |
| Burstein, Jill | 1 |
| Campbell, Heather | 1 |
| Chan, Eric | 1 |
| Espin, Christine | 1 |
| Gentile, Claudia | 1 |
| Han, Turgay | 1 |
| Huang, Jinyan | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 12 |
| Reports - Research | 12 |
| Numerical/Quantitative Data | 2 |
| Speeches/Meeting Papers | 2 |
| Tests/Questionnaires | 2 |
| Information Analyses | 1 |
| Reports - Evaluative | 1 |
Education Level
| Secondary Education | 3 |
| High Schools | 2 |
| Higher Education | 2 |
| Elementary Education | 1 |
| Elementary Secondary Education | 1 |
| Grade 10 | 1 |
| Grade 11 | 1 |
| Grade 12 | 1 |
| Grade 6 | 1 |
| Grade 7 | 1 |
| Grade 8 | 1 |
| More ▼ | |
Audience
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 5 |
| Graduate Management Admission… | 1 |
| International English… | 1 |
| Test of English for… | 1 |
What Works Clearinghouse Rating
Osama Koraishi – Language Teaching Research Quarterly, 2024
This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence
Sari, Elif; Han, Turgay – Reading Matrix: An International Online Journal, 2021
Providing both effective feedback applications and reliable assessment practices are two central issues in ESL/EFL writing instruction contexts. Giving individual feedback is very difficult in crowded classes as it requires a great amount of time and effort for instructors. Moreover, instructors likely employ inconsistent assessment procedures,…
Descriptors: Automation, Writing Evaluation, Artificial Intelligence, Natural Language Processing
Qu, Yanxuan; Huo, Yan; Chan, Eric; Shotts, Matthew – ETS Research Report Series, 2017
For educational tests, it is critical to maintain consistency of score scales and to understand the sources of variation in score means over time. This practice helps to ensure that interpretations about test takers' abilities are comparable from one administration (or one form) to another. This study examines the consistency of reported scores…
Descriptors: Scores, English (Second Language), Language Tests, Second Language Learning
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Kural, Faruk – Journal of Language and Linguistic Studies, 2018
The present paper, which is a study based on midterm exam results of 53 University English prep-school students, examines correlation between a direct writing test, measured holistically by multiple-trait scoring, and two indirect writing tests used in a competence exam, one of which is a multiple-choice cloze test and the other a rewrite test…
Descriptors: Writing Evaluation, Cloze Procedure, Comparative Analysis, Essays
Jahin, Jamal Hamed – Australian Journal of Teacher Education, 2012
This study aimed to ascertain the current level of writing apprehension experienced by Saudi prospective EFL teachers and their current level of essay writing ability. It also aimed to assess the impact of peer reviewing on their writing apprehension level and essay writing ability. Data collection was carried out via two instruments: Second…
Descriptors: Writing Tests, English (Second Language), Feedback (Response), Control Groups
Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – Applied Linguistics, 2010
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multi-trait) rating dimensions and their relationships to holistic scores and "e-rater"[R] essay feature variables in the context of the TOEFL[R] computer-based test (TOEFL CBT) writing assessment. Data analyzed in the study were holistic…
Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays
Espin, Christine; Wallace, Teri; Campbell, Heather; Lembke, Erica S.; Long, Jeffrey D.; Ticha, Renata – Exceptional Children, 2008
We examined the technical adequacy of writing progress measures as indicators of success on state standards tests. Tenth-grade students wrote for 10 min, marking their samples at 3, 5, and 7 min. Samples were scored for words written, words spelled correctly, and correct and correct minus incorrect word sequences. The number of correct minus…
Descriptors: Curriculum Based Assessment, State Standards, Second Language Learning, Writing (Composition)
Huang, Jinyan – Assessing Writing, 2008
Using generalizability theory, this study examined both the rating variability and reliability of ESL students' writing in the provincial English examinations in Canada. Three years' data were used in order to complete the analyses and examine the stability of the results. The major research question that guided this study was: Are there any…
Descriptors: Generalizability Theory, Foreign Countries, English (Second Language), Writing Tests
Lee, Yong-Won; Kantor, Robert – International Journal of Testing, 2007
Possible integrated and independent tasks were pilot tested for the writing section of a new generation of the TOEFL[R] (Test of English as a Foreign Language[TM]). This study examines the impact of various rating designs and of the number of tasks and raters on the reliability of writing scores based on integrated and independent tasks from the…
Descriptors: Generalizability Theory, Writing Tests, English (Second Language), Second Language Learning
Lee, Yong-Won; Kantor, Robert; Mollaun, Pam – 2002
This paper reports the results of generalizability theory (G) analyses done for new writing and speaking tasks for the Test of English as a Foreign Language (TOEFL). For writing, a special focus was placed on evaluating the impact on the reliability of the number of raters (or ratings) per essay (one or two) and the number of tasks (one, two, or…
Descriptors: English (Second Language), Generalizability Theory, Reliability, Scores
Lee, Yong-Won; Kantor, Robert – ETS Research Report Series, 2005
Possible integrated and independent tasks were pilot tested for the writing section of a new generation of TOEFL® (Test of English as a Foreign Language™) examination. This study examines the impact of various rating designs as well as the impact of the number of tasks and raters on the reliability of writing scores based on integrated and…
Descriptors: Language Tests, English (Second Language), Second Language Learning, Writing Tests
Lee, Yong-Won; Kantor, Robert; Mollaun, Pam – 2002
This study examines the score dependability of writing and speaking assessments from the Test of English as a Foreign Language (TOEFL) from the perspectives of univariate and multivariate generalizability theory (G-theory) and presents the findings of three separate G-theory studies. For writing, the focus was on evaluating the impact on…
Descriptors: Ability, English (Second Language), Generalizability Theory, Item Bias
Attali, Yigal; Burstein, Jill – ETS Research Report Series, 2005
The e-rater® system has been used by ETS for automated essay scoring since 1999. This paper describes a new version of e-rater (v.2.0) that differs from the previous one (v.1.3) with regard to the feature set and model building approach. The paper describes the new version, compares the new and previous versions in terms of performance, and…
Descriptors: Essay Tests, Automation, Scoring, Comparative Analysis

Peer reviewed
Direct link
