ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	2
Since 2007 (last 20 years)	3

Descriptor

Error of Measurement	3
Interrater Reliability	3
Data Analysis	2
Foreign Countries	2
Language Tests	2
Scoring	2
Accuracy	1
Coding	1
College Faculty	1
Componential Analysis	1
Evaluation Methods	1
Evaluators	1
Experience	1
Finno Ugric Languages	1
Focus Groups	1
Generalizability Theory	1
Indo European Languages	1
Item Response Theory	1
Language Teachers	1
Monte Carlo Methods	1
Novices	1
Performance Based Assessment	1
Rating Scales	1
Reliability	1
Research Problems	1
More ▼

Source

Language Testing

Author

Deygers, Bart	1
Iasonas Lamprianou	1
Lin, Chih-Kai	1
Reeta Neittaanmäki	1
Van Gorp, Koen	1

Publication Type

Journal Articles	3
Reports - Research	3

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Location

Finland	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 3 results Save | Export

All Types of Experience Are Equal, but Some Are More Equal: The Effect of Different Types of Experience on Rater Severity and Rater Consistency

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…

Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Determining the Scoring Validity of a Co-Constructed CEFR-Based Rating Scale

Peer reviewed

Direct link

Deygers, Bart; Van Gorp, Koen – Language Testing, 2015

Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…

Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability