ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	11

Descriptor

Statistical Analysis	12
Language Tests	10
Second Language Learning	9
English (Second Language)	8
Item Response Theory	8
Comparative Analysis	5
Foreign Countries	5
Elementary School Students	4
Difficulty Level	3
Language Proficiency	3
Native Speakers	3
Oral Language	3
Test Items	3
Classification	2
Correlation	2
Cutting Scores	2
Feedback (Response)	2
Listening Comprehension Tests	2
Models	2
Psychometrics	2
Questionnaires	2
Reading Comprehension	2
Regression (Statistics)	2
Responses	2
Scores	2
More ▼

Source

Language Testing

Author

Bachman, Lyle F.	1
Bae, Jungok	1
Batty, Aaron Olaf	1
Campfield, Dorota E.	1
Eckes, Thomas	1
Filipi, Anna	1
Knoch, Ute	1
Lee, Shinhye	1
Lee, Y-W.	1
Longabach, Tanya	1
O'Hagan, Sally	1
Peyton, Vicki	1
Pill, John	1
Wind, Stefanie A.	1
Winke, Paula	1
Zhang, Bo	1
Zhang, Ying	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	7
Reports - Evaluative	5
Tests/Questionnaires	1

Education Level

Elementary Education	4
High Schools	1
Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Australia	2
Georgia	1
Germany	1
Japan	1
Poland	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 12 results Save | Export

A Nonparametric Procedure for Exploring Differences in Rating Quality across Test-Taker Subgroups in Rater-Mediated Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2019

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…

Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests

A Comparison of Reliability and Precision of Subscore Reporting Methods for a State English Language Proficiency Assessment

Peer reviewed

Direct link

Longabach, Tanya; Peyton, Vicki – Language Testing, 2018

K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…

Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency

Setting Cut Scores on an EFL Placement Test Using the Prototype Group Method: A Receiver Operating Characteristic (ROC) Analysis

Peer reviewed

Direct link

Eckes, Thomas – Language Testing, 2017

This paper presents an approach to standard setting that combines the prototype group method (PGM; Eckes, 2012) with a receiver operating characteristic (ROC) analysis. The combined PGM-ROC approach is applied to setting cut scores on a placement test of English as a foreign language (EFL). To implement the PGM, experts first named learners whom…

Descriptors: English (Second Language), Language Tests, Cutting Scores, Standard Setting (Scoring)

Young Learners' Response Processes When Taking Computerized Tasks for Speaking Assessment

Peer reviewed

Direct link

Lee, Shinhye; Winke, Paula – Language Testing, 2018

We investigated how young language learners process their responses on and perceive a computer-mediated, timed speaking test. Twenty 8-, 9-, and 10-year-old non-native English-speaking children (NNSs) and eight same-aged, native English-speaking children (NSs) completed seven computerized sample TOEFL® Primary™ speaking test tasks. We investigated…

Descriptors: Elementary School Students, Second Language Learning, Responses, Computer Assisted Testing

Lexical Difficulty--Using Elicited Imitation to Study Child L2

Peer reviewed

Direct link

Campfield, Dorota E. – Language Testing, 2017

This paper reports a post-hoc analysis of the influence of lexical difficulty of cue sentences on performance in an elicited imitation (EI) task to assess oral production skills for 645 child L2 English learners in instructional settings. This formed part of a large-scale investigation into effectiveness of foreign language teaching in Polish…

Descriptors: Difficulty Level, Second Language Learning, Second Language Instruction, Elementary School Students

Extending the Scope of Speaking Assessment Criteria in a Specific-Purpose Language Test: Operationalizing a Health Professional Perspective

Peer reviewed

Direct link

O'Hagan, Sally; Pill, John; Zhang, Ying – Language Testing, 2016

Criticism of specific-purpose language (LSP) tests is often directed at their limited ability to represent fully the demands of the target language use situation. Such criticisms extend to the criteria used to assess test performance, which may fail to capture what matters to participants in the domain of interest. This paper reports on the…

Descriptors: Health Personnel, Language Tests, English for Special Purposes, Criticism

A Comparison of Video- and Audio-Mediated Listening Tests with Many-Facet Rasch Modeling and Differential Distractor Functioning

Peer reviewed

Direct link

Batty, Aaron Olaf – Language Testing, 2015

The rise in the affordability of quality video production equipment has resulted in increased interest in video-mediated tests of foreign language listening comprehension. Although research on such tests has continued fairly steadily since the early 1980s, studies have relied on analyses of raw scores, despite the growing prevalence of item…

Descriptors: Listening Comprehension Tests, Comparative Analysis, Video Technology, Audio Equipment

Assessing the Accuracy and Consistency of Language Proficiency Classification under Competing Measurement Models

Peer reviewed

Direct link

Zhang, Bo – Language Testing, 2010

This article investigates how measurement models and statistical procedures can be applied to estimate the accuracy of proficiency classification in language testing. The paper starts with a concise introduction of four measurement models: the classical test theory (CTT) model, the dichotomous item response theory (IRT) model, the testlet response…

Descriptors: Language Tests, Classification, Item Response Theory, Statistical Analysis

Do Questions Written in the Target Language Make Foreign Language Listening Comprehension Tests More Difficult?

Peer reviewed

Direct link

Filipi, Anna – Language Testing, 2012

The Assessment of Language Competence (ALC) certificates is an annual, international testing program developed by the Australian Council for Educational Research to test the listening and reading comprehension skills of lower to middle year levels of secondary school. The tests are developed for three levels in French, German, Italian and…

Descriptors: Listening Comprehension Tests, Item Response Theory, Statistical Analysis, Foreign Countries

An Investigation of Four Writing Traits and Two Tasks across Two Languages

Peer reviewed

Direct link

Bae, Jungok; Bachman, Lyle F. – Language Testing, 2010

This study investigated the validity of four theoretically motivated traits of writing ability across English and Korean, based on elementary school students' responses to letter- and story-writing tasks. Their responses were scored analytically and analyzed using confirmatory factor analysis. The findings include the following. A model of writing…

Descriptors: Elementary School Students, Validity, Korean, English (Second Language)

Diagnostic Assessment of Writing: A Comparison of Two Rating Scales

Peer reviewed

Direct link

Knoch, Ute – Language Testing, 2009

Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners' use of language and focus on specific elements rather than global abilities. However, rating scales used in performance assessment have been repeatedly criticized for being imprecise and therefore often resulting in holistic marking by raters…

Descriptors: Feedback (Response), Language Usage, Performance Based Assessment, Performance Tests

Examining Passage-Related Local Item Dependence (LID) and Measurement Construct using Q3 Statistics in an EFL Reading Comprehension Test

Peer reviewed

Direct link

Lee, Y-W. – Language Testing, 2004

The purpose of the study reported in this article is to empirically examine passage-related local item dependence (LID) by using an IRT (item response theory) based LID index called Q3 in an EFL reading comprehension test, with a special focus on item types as a potentially competing source of LID with passages. In this article, definitions and…

Descriptors: Psychometrics, Item Response Theory, Content Analysis, Reading Comprehension