Publication Date
| In 2026 | 0 |
| Since 2025 | 4 |
| Since 2022 (last 5 years) | 20 |
| Since 2017 (last 10 years) | 52 |
| Since 2007 (last 20 years) | 98 |
Descriptor
| Comparative Analysis | 149 |
| English (Second Language) | 149 |
| Second Language Learning | 100 |
| Foreign Countries | 83 |
| Language Tests | 69 |
| Second Language Instruction | 68 |
| Test Reliability | 61 |
| Reliability | 52 |
| Language Proficiency | 41 |
| Interrater Reliability | 39 |
| Scores | 36 |
| More ▼ | |
Source
Author
| Coniam, David | 3 |
| Kunnan, Antony John | 3 |
| Attali, Yigal | 2 |
| August, Diane | 2 |
| Henning, Grant | 2 |
| Nakamura, Yuji | 2 |
| Abdel-Haq, Eman Muhammad | 1 |
| Adams, R. J. | 1 |
| Ahmadi Shirazi, Masoumeh | 1 |
| Ahmadi, Alireza | 1 |
| Ahmed, Tamim | 1 |
| More ▼ | |
Publication Type
Education Level
Audience
| Teachers | 5 |
| Practitioners | 4 |
| Researchers | 2 |
| Administrators | 1 |
Location
| Iran | 13 |
| China | 9 |
| Japan | 8 |
| Turkey | 7 |
| Taiwan | 4 |
| United States | 4 |
| Egypt | 3 |
| Hong Kong | 3 |
| Pakistan | 3 |
| Saudi Arabia | 3 |
| Australia | 2 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025
The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…
Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction
Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025
This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Alain Bengochea; Sabrina F. Sembiante – Review of Education, 2024
This best-evidence synthesis appraises the design and outcome characteristics of vocabulary intervention studies conducted with preschool through 6th grade emergent bilingual (EB) children and spotlights rigorously designed studies for which effects could be better attributed to instructional features. Twenty-nine selected studies were analysed…
Descriptors: Bilingualism, Vocabulary Development, Intervention, Comparative Analysis
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction
Tim Stoeckel; Liang Ye Tan; Hung Tan Ha; Nam Thi Phuong Ho; Tomoko Ishii; Young Ae Kim; Chunmei Huang; Stuart McLean – Vocabulary Learning and Instruction, 2024
Local item dependency (LID) occurs when test-takers' responses to one test item are affected by their responses to another. It can be problematic if it causes inflated reliability estimates or distorted person and item measures. The cued-recall reading comprehension test in Hu and Nation's (2000) well-known and influential coverage--comprehension…
Descriptors: Reading Comprehension, English (Second Language), Second Language Instruction, Second Language Learning
Osama Koraishi – Language Teaching Research Quarterly, 2024
This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
Sims, Maureen E.; Cox, Troy L.; Eckstein, Grant T.; Hartshorn, K. James; Wilcox, Matthew P.; Hart, Judson M. – Educational Measurement: Issues and Practice, 2020
The purpose of this study is to explore the reliability of a potentially more practical approach to direct writing assessment in the context of ESL writing. Traditional rubric rating (RR) is a common yet resource-intensive evaluation practice when performed reliably. This study compared the traditional rubric model of ESL writing assessment and…
Descriptors: Scoring Rubrics, Item Response Theory, Second Language Learning, English (Second Language)
Kasikarn Bansong; Somkiet Poopatwiboon; Apisak Sukying – Journal of Education and Learning, 2023
It is increasingly prevalent in digital learning and teaching strategies for discerning a global perspective on creating the student learning experience. Multimodality is an emergent phenomenon that may influence how digital learning is designed, especially during the COVID-19 pandemic in which immersive learning environments, such as a virtual…
Descriptors: Elementary School Students, English (Second Language), Second Language Learning, Second Language Instruction
Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024
To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…
Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)
Pilar Rodríguez-Arancón; María Bobadilla-Pérez; Alberto Fernández-Costales – Journal for Multicultural Education, 2024
Purpose: This study aims to delve into the interplay between didactic audiovisual translation (DAT) and computer-assisted language learning (CALL), exploring their combined impact on the development of intercultural competence (IC) among learners of English as a foreign language (EFL). Design/methodology/approach: Using a quasi-experimental…
Descriptors: Translation, Teaching Methods, Second Language Learning, Second Language Instruction
Investigating the Impact of Rater Training on Rater Errors in the Process of Assessing Writing Skill
Sata, Mehmet; Karakaya, Ismail – International Journal of Assessment Tools in Education, 2022
In the process of measuring and assessing high-level cognitive skills, interference of rater errors in measurements brings about a constant concern and low objectivity. The main purpose of this study was to investigate the impact of rater training on rater errors in the process of assessing individual performance. The study was conducted with a…
Descriptors: Evaluators, Training, Comparative Analysis, Academic Language
Wu, Shu-Ling; Tio, Yee Pin; Ortega, Lourdes – Studies in Second Language Acquisition, 2022
Elicited imitation (EI), a short-cut measure of global proficiency in second language (L2) research, requires participants to listen to sentences and repeat them as closely as possible. To support instrument sharing and assessment of L2 proficiency for longitudinal and crosslinguistic research, we created a parallel form of an EI task (EIT) for L2…
Descriptors: Imitation, Second Language Learning, Second Language Instruction, Language Proficiency

Peer reviewed
Direct link
