ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	3
Since 2017 (last 10 years)	9
Since 2007 (last 20 years)	17

Descriptor

Interrater Reliability	21
Second Language Learning	21
Language Tests	13
English (Second Language)	11
Evaluators	9
Correlation	8
Foreign Countries	7
Oral Language	6
Rating Scales	6
Writing Evaluation	6
Language Proficiency	5
Statistical Analysis	5
Writing Tests	5
Comparative Analysis	4
Language Teachers	4
Second Language Instruction	4
Testing	4
Item Response Theory	3
Language Variation	3
Native Speakers	3
Scoring	3
Secondary School Students	3
Test Bias	3
Achievement Rating	2
Communicative Competence…	2
More ▼

Source

Language Testing

Publication Type

Journal Articles	21
Reports - Research	18
Reports - Evaluative	2
Tests/Questionnaires	2
Reports - Descriptive	1

Education Level

Higher Education	4
Secondary Education	3
Postsecondary Education	2
Adult Education	1
Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
Kindergarten	1
Primary Education	1

Audience

Location

Netherlands	2
Arizona	1
China	1
Europe	1
Finland	1
Hong Kong	1
Illinois (Urbana)	1
India	1
Japan	1
Ohio	1
South Korea	1
Sweden	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	3
Peabody Picture Vocabulary…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 21 results Save | Export

Comparison of Traditional Machine Learning and Neural Network Approaches for Automated Scoring of Second Language English Essays

Peer reviewed

Direct link

Erik Voss – Language Testing, 2025

An increasing number of language testing companies are developing and deploying deep learning-based automated essay scoring systems (AES) to replace traditional approaches that rely on handcrafted feature extraction. However, there is hesitation to accept neural network approaches to automated essay scoring because the features are automatically…

Descriptors: Artificial Intelligence, Automation, Scoring, English (Second Language)

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

Development and Validation of a Chinese Character Acquisition Assessment for Second-Language Kindergarteners

Peer reviewed

Direct link

Chan, Stephanie W. Y.; Cheung, Wai Ming; Huang, Yanli; Lam, Wai-Ip; Lin, Chin-Hsi – Language Testing, 2020

Demand for second-language (L2) Chinese education for kindergarteners has grown rapidly, but little is known about these kindergarteners' L2 skills, with existing studies focusing on school-age populations and alphabetic languages. Accordingly, we developed a six-subtest Chinese character acquisition assessment to measure L2 kindergarteners'…

Descriptors: Chinese, Second Language Learning, Second Language Instruction, Written Language

Measuring L2 Speakers' Interactional Ability Using Interactive Speech Tasks

Peer reviewed

Direct link

van Batenburg, Eline S. L.; Oostdam, Ron J.; van Gelderen, Amos J. S.; de Jong, Nivja H. – Language Testing, 2018

This article explores ways to assess interactional performance, and reports on the use of a test format that standardizes the interlocutor's linguistic and interactional contributions to the exchange. It describes the construction and administration of six scripted speech tasks (instruction, advice, and sales tasks) with pre-vocational learners (n…

Descriptors: Second Language Learning, Speech Tests, Interaction, Test Reliability

The Effect of Training and Rater Differences on Oral Proficiency Assessment

Peer reviewed

Direct link

Kang, Okim; Rubin, Don; Kermad, Alyssa – Language Testing, 2019

As a result of the fact that judgments of non-native speech are closely tied to social biases, oral proficiency ratings are susceptible to error because of rater background and social attitudes. In the present study we seek first to estimate the variance attributable to rater background and attitudinal variables on novice raters' assessments of L2…

Descriptors: Evaluators, Second Language Learning, Language Tests, English (Second Language)

The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

Peer reviewed

Direct link

Davis, Larry – Language Testing, 2016

Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…

Descriptors: Evaluators, Oral Language, Scores, Language Tests

Functional Adequacy in L2 Writing: Towards a New Rating Scale

Peer reviewed

Direct link

Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017

The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…

Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse

Grounding Lexical Diversity in Human Judgments

Peer reviewed

Direct link

Jarvis, Scott – Language Testing, 2017

The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…

Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers

SLA Developmental Stages and Teachers' Assessment of Written French: Exploring Direkt Profil as a Diagnostic Assessment Tool

Peer reviewed

Direct link

Granfeldt, Jonas; Ågren, Malin – Language Testing, 2014

One core area of research in Second Language Acquisition is the identification and definition of developmental stages in different L2s. For L2 French, Bartning and Schlyter (2004) presented a model of six morphosyntactic stages of development in the shape of grammatical profiles. The model formed the basis for the computer program Direkt Profil…

Descriptors: Second Language Learning, Language Tests, French, Language Teachers

The Essentials of Assessment Literacy: Contrasts between Testers and Users

Peer reviewed

Direct link

Malone, Margaret E – Language Testing, 2013

Language assessment literacy refers to language instructors' familiarity with testing definitions and the application of this knowledge to classroom practices in general and specifically to issues related to assessing language. While it is widely agreed that classroom teachers need to assess student progress, many teachers and other test…

Descriptors: Literacy, Language Tests, Interviews, Feedback (Response)

Assessing Learners' Writing Skills in a SLA Study: Validating the Rating Process across Tasks, Scales and Languages

Peer reviewed

Direct link

Huhta, Ari; Alanen, Riikka; Tarnanen, Mirja; Martin, Maisa; Hirvelä, Tuija – Language Testing, 2014

There is still relatively little research on how well the CEFR and similar holistic scales work when they are used to rate L2 texts. Using both multifaceted Rasch analyses and qualitative data from rater comments and interviews, the ratings obtained by using a CEFR-based writing scale and the Finnish National Core Curriculum scale for L2 writing…

Descriptors: Foreign Countries, Writing Skills, Second Language Learning, Finno Ugric Languages

Native Speakers' Perceptions of Fluency and Accent in L2 Speech

Peer reviewed

Direct link

Pinget, Anne-France; Bosker, Hans Rutger; Quené, Hugo; de Jong, Nivja H. – Language Testing, 2014

Oral fluency and foreign accent distinguish L2 from L1 speech production. In language testing practices, both fluency and accent are usually assessed by raters. This study investigates what exactly native raters of fluency and accent take into account when judging L2. Our aim is to explore the relationship between objectively measured temporal,…

Descriptors: Native Speakers, Language Fluency, Suprasegmentals, Second Language Learning

Does a Rater's Familiarity with a Candidate's Pronunciation Affect the Rating in Oral Proficiency Interviews?

Peer reviewed

Direct link

Carey, Michael D.; Mannell, Robert H.; Dunn, Peter K. – Language Testing, 2011

This study investigated factors that could affect inter-examiner reliability in the pronunciation assessment component of speaking tests. We hypothesized that the rating of pronunciation is susceptible to variation in assessment due to the amount of exposure examiners have to nonnative English accents. An inter-rater variability analysis was…

Descriptors: Oral Language, Pronunciation, Phonology, Interlanguage

Previous Page | Next Page »

Pages: 1 | 2

de Jong, Nivja H.	2
Alanen, Riikka	1
Barkhuizen, Gary	1
Bosker, Hans Rutger	1
Brown, Annie	1
Carey, Michael D.	1
Chan, Stephanie W. Y.	1
Cheung, Wai Ming	1
Chuang, Ping-Lin	1
Davis, Larry	1
Dunn, Peter K.	1
Elder, Catherine	1
Erik Voss	1
Granfeldt, Jonas	1
Grant, Leslie	1
Henning, Grant	1
Hirvelä, Tuija	1
Huang, Yanli	1
Huhta, Ari	1
Jarvis, Scott	1
Kang, Okim	1
Kermad, Alyssa	1
Knoch, Ute	1
Kuiken, Folkert	1
Kyriakou, Nansia	1
More ▼