NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 12 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A. – Language Testing, 2019
Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…
Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Longabach, Tanya; Peyton, Vicki – Language Testing, 2018
K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…
Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency
Peer reviewed Peer reviewed
Direct linkDirect link
Eckes, Thomas – Language Testing, 2017
This paper presents an approach to standard setting that combines the prototype group method (PGM; Eckes, 2012) with a receiver operating characteristic (ROC) analysis. The combined PGM-ROC approach is applied to setting cut scores on a placement test of English as a foreign language (EFL). To implement the PGM, experts first named learners whom…
Descriptors: English (Second Language), Language Tests, Cutting Scores, Standard Setting (Scoring)
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Shinhye; Winke, Paula – Language Testing, 2018
We investigated how young language learners process their responses on and perceive a computer-mediated, timed speaking test. Twenty 8-, 9-, and 10-year-old non-native English-speaking children (NNSs) and eight same-aged, native English-speaking children (NSs) completed seven computerized sample TOEFL® Primary™ speaking test tasks. We investigated…
Descriptors: Elementary School Students, Second Language Learning, Responses, Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Campfield, Dorota E. – Language Testing, 2017
This paper reports a post-hoc analysis of the influence of lexical difficulty of cue sentences on performance in an elicited imitation (EI) task to assess oral production skills for 645 child L2 English learners in instructional settings. This formed part of a large-scale investigation into effectiveness of foreign language teaching in Polish…
Descriptors: Difficulty Level, Second Language Learning, Second Language Instruction, Elementary School Students
Peer reviewed Peer reviewed
Direct linkDirect link
O'Hagan, Sally; Pill, John; Zhang, Ying – Language Testing, 2016
Criticism of specific-purpose language (LSP) tests is often directed at their limited ability to represent fully the demands of the target language use situation. Such criticisms extend to the criteria used to assess test performance, which may fail to capture what matters to participants in the domain of interest. This paper reports on the…
Descriptors: Health Personnel, Language Tests, English for Special Purposes, Criticism
Peer reviewed Peer reviewed
Direct linkDirect link
Batty, Aaron Olaf – Language Testing, 2015
The rise in the affordability of quality video production equipment has resulted in increased interest in video-mediated tests of foreign language listening comprehension. Although research on such tests has continued fairly steadily since the early 1980s, studies have relied on analyses of raw scores, despite the growing prevalence of item…
Descriptors: Listening Comprehension Tests, Comparative Analysis, Video Technology, Audio Equipment
Peer reviewed Peer reviewed
Direct linkDirect link
Zhang, Bo – Language Testing, 2010
This article investigates how measurement models and statistical procedures can be applied to estimate the accuracy of proficiency classification in language testing. The paper starts with a concise introduction of four measurement models: the classical test theory (CTT) model, the dichotomous item response theory (IRT) model, the testlet response…
Descriptors: Language Tests, Classification, Item Response Theory, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Filipi, Anna – Language Testing, 2012
The Assessment of Language Competence (ALC) certificates is an annual, international testing program developed by the Australian Council for Educational Research to test the listening and reading comprehension skills of lower to middle year levels of secondary school. The tests are developed for three levels in French, German, Italian and…
Descriptors: Listening Comprehension Tests, Item Response Theory, Statistical Analysis, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Bae, Jungok; Bachman, Lyle F. – Language Testing, 2010
This study investigated the validity of four theoretically motivated traits of writing ability across English and Korean, based on elementary school students' responses to letter- and story-writing tasks. Their responses were scored analytically and analyzed using confirmatory factor analysis. The findings include the following. A model of writing…
Descriptors: Elementary School Students, Validity, Korean, English (Second Language)
Peer reviewed Peer reviewed
Direct linkDirect link
Knoch, Ute – Language Testing, 2009
Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners' use of language and focus on specific elements rather than global abilities. However, rating scales used in performance assessment have been repeatedly criticized for being imprecise and therefore often resulting in holistic marking by raters…
Descriptors: Feedback (Response), Language Usage, Performance Based Assessment, Performance Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Y-W. – Language Testing, 2004
The purpose of the study reported in this article is to empirically examine passage-related local item dependence (LID) by using an IRT (item response theory) based LID index called Q3 in an EFL reading comprehension test, with a special focus on item types as a potentially competing source of LID with passages. In this article, definitions and…
Descriptors: Psychometrics, Item Response Theory, Content Analysis, Reading Comprehension