Publication Date
| In 2026 | 0 |
| Since 2025 | 6 |
| Since 2022 (last 5 years) | 21 |
| Since 2017 (last 10 years) | 52 |
| Since 2007 (last 20 years) | 93 |
Descriptor
Source
| Language Testing | 120 |
Author
| Knoch, Ute | 4 |
| Alderson, J. Charles | 2 |
| Aryadoust, Vahid | 2 |
| Attali, Yigal | 2 |
| Brown, James Dean | 2 |
| Chapelle, Carol A. | 2 |
| Deygers, Bart | 2 |
| Elder, Catherine | 2 |
| Haug, Tobias | 2 |
| Iasonas Lamprianou | 2 |
| Jarvis, Scott | 2 |
| More ▼ | |
Publication Type
| Journal Articles | 120 |
| Reports - Research | 84 |
| Reports - Evaluative | 23 |
| Reports - Descriptive | 9 |
| Information Analyses | 6 |
| Tests/Questionnaires | 5 |
| Opinion Papers | 3 |
| Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
| China | 7 |
| Netherlands | 7 |
| Finland | 4 |
| Germany | 4 |
| Australia | 3 |
| Japan | 3 |
| South Korea | 3 |
| Canada | 2 |
| France | 2 |
| Hong Kong | 2 |
| Taiwan | 2 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 10 |
| ACTFL Oral Proficiency… | 1 |
| English Proficiency Test | 1 |
| Graduate Record Examinations | 1 |
| International English… | 1 |
| Peabody Picture Vocabulary… | 1 |
| Test of Written English | 1 |
What Works Clearinghouse Rating
Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021
This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…
Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation
Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018
As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…
Descriptors: Automation, Scoring, Speech Tests, Language Tests
Toprak, Tugba Elif; Cakir, Abdulvahit – Language Testing, 2021
Cognitive diagnostic assessment (CDA) has been applied to language assessment in a number of studies in which a diagnostic classification model (DCM) was retrofitted to the results of a non-diagnostic assessment. However, the need to apply CDA through utilization of an inductive rather than a retrofitted approach has been a recurrent theme in…
Descriptors: English (Second Language), Second Language Learning, Undergraduate Students, Young Adults
Haug, Tobias; Batty, Aaron Olaf; Venetz, Martin; Notter, Christa; Girard-Groeber, Simone; Knoch, Ute; Audeoud, Mireille – Language Testing, 2020
In this study we seek evidence of validity according to the socio-cognitive framework (Weir, 2005) for a new sentence repetition test (SRT) for young Deaf L1 Swiss German Sign Language (DSGS) users. SRTs have been developed for various purposes for both spoken and sign languages to assess language development in children. In order to address the…
Descriptors: Foreign Countries, Language Tests, Sentences, Repetition
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Schnoor, Birger; Hartig, Johannes; Klinger, Thorsten; Naumann, Alexander; Usanova, Irina – Language Testing, 2023
Research on assessing English as a foreign language (EFL) development has been growing recently. However, empirical evidence from longitudinal analyses based on substantial samples is still needed. In such settings, tests for measuring language development must meet high standards of test quality such as validity, reliability, and objectivity, as…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Longitudinal Studies
Kleijn, Suzanne; Pander Maat, Henk; Sanders, Ted – Language Testing, 2019
Although there are many methods available for assessing text comprehension, the cloze test is not widely acknowledged as one of them. Critiques on cloze testing center on its supposedly limited ability to measure comprehension beyond the sentence. However, these critiques do not hold for all types of cloze tests; the particular configuration of a…
Descriptors: Cloze Procedure, Language Tests, Semantics, Scoring
Norris, John; Drackert, Anastasia – Language Testing, 2018
The Test of German as a Foreign Language (TestDaF) plays a critical role as a standardized test of German language proficiency. Developed and administered by the Society for Academic Study Preparation and Test Development (g.a.s.t.), TestDaF was launched in 2001 and has experienced persistent annual growth, with more than 44,000 test takers in…
Descriptors: German, Second Language Learning, Language Tests, Language Proficiency
Duijm, Klaartje; Schoonen, Rob; Hulstijn, Jan H. – Language Testing, 2018
It is general practice to use rater judgments in speaking proficiency testing. However, it has been shown that raters' knowledge and experience may influence their ratings, both in terms of leniency and varied focus on different aspects of speech. The purpose of this study is to identify raters' relative responsiveness to fluency and linguistic…
Descriptors: Language Fluency, Accuracy, Second Languages, Language Tests
Isbell, Dan; Winke, Paula – Language Testing, 2019
The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…
Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning
Cai, Yuyang; Kunnan, Antony John – Language Testing, 2020
An essential hypothesis of modern language assessment theory pertains to the interaction between strategy use ability (strategic competence) and second language knowledge. However, how they interact with each other is rarely explored. Drawing on relevant research in the literature, in this paper we proposed three interaction patterns (i.e.,…
Descriptors: English (Second Language), Second Language Learning, Nursing Education, Reading Tests
Chan, Stephanie W. Y.; Cheung, Wai Ming; Huang, Yanli; Lam, Wai-Ip; Lin, Chin-Hsi – Language Testing, 2020
Demand for second-language (L2) Chinese education for kindergarteners has grown rapidly, but little is known about these kindergarteners' L2 skills, with existing studies focusing on school-age populations and alphabetic languages. Accordingly, we developed a six-subtest Chinese character acquisition assessment to measure L2 kindergarteners'…
Descriptors: Chinese, Second Language Learning, Second Language Instruction, Written Language
Choi, Ikkyu; Papageorgiou, Spiros – Language Testing, 2020
Stakeholders of language tests are often interested in subscores. However, reporting a subscore is not always justified; a subscore should provide reliable and distinct information to be worth reporting. When a subscore is used for decisions across multiple levels (e.g., individual test takers and schools), it needs to be justified for its…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Scores
Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013
Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…
Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests
Longabach, Tanya; Peyton, Vicki – Language Testing, 2018
K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…
Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency

Peer reviewed
Direct link
