ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	6
Since 2007 (last 20 years)	9

Descriptor

Evaluation Methods	18
Interrater Reliability	18
Language Tests	18
Evaluators	10
English (Second Language)	8
Scoring	7
Oral Language	6
Second Language Learning	6
Evaluation Criteria	5
Foreign Countries	5
Language Proficiency	5
Test Validity	5
Second Language Instruction	4
Comparative Analysis	3
Computer Software	3
Higher Education	3
Scores	3
Statistical Analysis	3
Test Reliability	3
Accuracy	2
Adults	2
Communication Disorders	2
Computer Assisted Testing	2
Correlation	2
Factor Structure	2
More ▼

Source

Canadian Modern Language…	2
English Language Teaching	2
Language Testing	2
Clinical Linguistics &…	1
ETS Research Report Series	1
International Journal of…	1
Journal of Communication…	1
Language Education &…	1
Language Learning Journal	1
Language Testing in Asia	1
System	1
More ▼

Publication Type

Reports - Research	15
Journal Articles	14
Tests/Questionnaires	3
Information Analyses	2
Reports - Evaluative	2
Speeches/Meeting Papers	2

Education Level

Higher Education	2
Postsecondary Education	2
High Schools	1
Secondary Education	1

Audience

Researchers

Location

China	1
Europe	1
Netherlands	1
Pennsylvania	1
United Kingdom (Great Britain)	1
Vietnam	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Graduate Record Examinations	1
Woodcock Reading Mastery Test	1

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Investigating the Effect of Classroom-Based Feedback on Speaking Assessment: A Multifaceted Rasch Analysis

Peer reviewed

Direct link

Bijani, Houman; Hashempour, Bahareh; Ibrahim, Khaled Ahmed Abdel-Al; Orabah, Salim Said Bani; Heydarnejad, Tahereh – Language Testing in Asia, 2022

Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, the process for obtaining more consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral assessment is rater training. Recently,…

Descriptors: Oral Language, Language Tests, Feedback (Response), Bias

Fairness in Oral Language Assessment: Training Raters and Considering Examinees' Expectations

Peer reviewed
PDF on ERIC

Download full text

Doosti, Mehdi; Ahmadi Safa, Mohammad – International Journal of Language Testing, 2021

This study examined the effect of rater training on promoting inter-rater reliability in oral language assessment. It also investigated whether rater training and the consideration of the examinees' expectations by the examiners have any effect on test-takers' perceptions of being fairly evaluated. To this end, four raters scored 31 Iranian…

Descriptors: Oral Language, Language Tests, Interrater Reliability, Training

A Systematic Review of Methods for Evaluating Rating Quality in Language Assessment

Peer reviewed

Direct link

Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…

Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability

The Processes of Rating L2 Speaking Performance Using an Analytic Rating Scale -- A Qualitative Exploration

Peer reviewed
PDF on ERIC

Download full text

Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022

In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…

Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction

Comparison of Automatic and Expert Teachers' Rating of Computerized English Listening-Speaking Test

Peer reviewed
PDF on ERIC

Download full text

Linlin, Cao – English Language Teaching, 2020

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…

Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

The Inter-Rater Reliability in Scoring Composition

Peer reviewed
PDF on ERIC

Download full text

Wang, Ping – English Language Teaching, 2009

This paper makes a study of the rater reliability in scoring composition in the test of English as a foreign language (EFL) and focuses on the inter-rater reliability as well as several interactions between raters and the other facets involved (that is examinees, rating criteria and rating methods). Results showed that raters were fairly…

Descriptors: Interrater Reliability, Scoring, Writing (Composition), English (Second Language)

Manipulating Articulatory Demands in Non-Word Repetition: A "Late-8" Non-Word Repetition Task

Peer reviewed

Direct link

Moore, Michelle W.; Tompkins, Connie A.; Dollaghan, Christine A. – Clinical Linguistics & Phonetics, 2010

The purpose of this paper was to examine the psychometric properties of a non-word repetition task (NRT), the Late-8 Non-word Repetition Task (L8NRT). This task was designed similarly to the NRT, but contains only Late-8 consonants to increase articulatory demands and avoid ceiling effects in studies with adolescents and adults. Thirty college…

Descriptors: Psychometrics, Mastery Tests, Repetition, College Students

Non-native Raters Determining the Oral Proficiency of EFL Learners.

Peer reviewed

Jafarpur, Abdoljavad – System, 1988

Investigation of non-native English speakers' ratings of other non-native English learners' oral proficiency. Results indicate that the judges' ratings significantly differed, and the average of three judges' ratings was a better appraisal of the testee's true ability than that of any single rating or pair of ratings. (Author/CB)

Descriptors: English (Second Language), Evaluation Methods, Foreign Countries, Interrater Reliability

Rater Reliability of the ACTFL Oral Proficiency Interview.

Peer reviewed

Magnan, Sally Sieloff – Canadian Modern Language Review, 1987

Differences between the academic (American Council on the Teaching of Foreign Languages) and government (Foreign Service Institute) versions of the oral proficiency interview test are examined, and data from two studies of interrater reliability are presented and discussed. (MSE)

Descriptors: Evaluation Methods, Interrater Reliability, Language Proficiency, Language Tests

Assessing the Assessments: A Comparison of Two Clinical Pragmatic Profiles.

Peer reviewed

Ball, Martin J.; And Others – Journal of Communication Disorders, 1991

This study investigated two pragmatic profiles (the Pragmatic Profile and the Profile of Communicative Appropriateness) used to assess the language of two aphasic patients. The study examined interscorer reliability, scoring sensitivity, and diagnostic accuracy. Findings indicate that training in scoring these profiles must be uniform, and greater…

Descriptors: Adults, Aphasia, Behavior Rating Scales, Communication Disorders

Rater Reliability of the ACTFL Oral Proficiency Interview.

Peer reviewed

Magnan, Sally Sieloff – Canadian Modern Language Review, 1987

Differences in procedures used by academic institutions and government agencies in administering the American Council on the Teaching of Foreign Languages' Oral Proficiency Interview test are examined, and results and implications of two studies of interrater reliability are discussed. (MSE)

Descriptors: Comparative Analysis, Correlation, Evaluation Methods, Evaluators

Language Testing: Recent Developments and Persistent Dilemmas.

Download full text

Takala, Sauli – 1998

This paper discusses recent developments in language testing. It begins with a review of the traditional criteria that are applied to all measurement and outlines recent emphases that derive from the expanding range of stakeholders. Drawing on Alderson's seminal work, criteria are presented for evaluating communicative language tests. Developments…

Descriptors: Alternative Assessment, Communicative Competence (Languages), Comparative Analysis, Evaluation Criteria

Toward an Understanding of the Role of Speech Recognition in Nonnative Speech Assessment. TOEFL iBT Research Report. TOEFL iBT-02. ETS RR-07-02

Peer reviewed
PDF on ERIC

Download full text

Zechner, Klaus; Bejar, Isaac I.; Hemat, Ramin – ETS Research Report Series, 2007

The increasing availability and performance of computer-based testing has prompted more research on the automatic assessment of language and speaking proficiency. In this investigation, we evaluated the feasibility of using an off-the-shelf speech-recognition system for scoring speaking prompts from the LanguEdge field test of 2002. We first…

Descriptors: Role, Computer Assisted Testing, Language Proficiency, Oral Language

Model Responses for Examinations with Open-Ended Questions.

Kreeft, Henk; Sanders, Piet – 1983

In the Dutch national examinations, reading comprehension tests are used for all languages. For the native language, reading comprehension is tested with reading passages and related questions to which the test-taker provides his own response, not choosing from a group of alternatives. One problem encountered in testing with these items is…

Descriptors: Dutch, Evaluation Methods, Evaluators, Foreign Countries

Previous Page | Next Page »

Pages: 1 | 2

Bejar, Isaac I.	2
Magnan, Sally Sieloff	2
Ahmadi Safa, Mohammad	1
Ball, Martin J.	1
Bijani, Houman	1
Carlson, Sybil B.	1
Chambers, Francine	1
Dollaghan, Christine A.	1
Doosti, Mehdi	1
Hashempour, Bahareh	1
Hemat, Ramin	1
Heydarnejad, Tahereh	1
Ibrahim, Khaled Ahmed Abdel-Al	1
Jafarpur, Abdoljavad	1
Kreeft, Henk	1
Lin, Chih-Kai	1
Linlin, Cao	1
Moore, Michelle W.	1
Orabah, Salim Said Bani	1
Peterson, Meghan E.	1
Richards, Brian	1
Sanders, Piet	1
Sheehan, Susan	1
Takala, Sauli	1
More ▼