ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	7
Since 2017 (last 10 years)	20
Since 2007 (last 20 years)	31

Descriptor

Interrater Reliability	36
Scoring	36
Second Language Learning	36
English (Second Language)	27
Language Tests	24
Foreign Countries	20
Evaluators	15
Second Language Instruction	11
Computer Assisted Testing	10
Correlation	10
Comparative Analysis	9
Scores	9
Writing Evaluation	9
College Students	7
Computer Software	7
Essays	7
Language Proficiency	7
Oral Language	6
Automation	5
Evaluation Methods	5
Speech Communication	5
Student Evaluation	5
Writing Tests	5
Item Response Theory	4
Test Validity	4
More ▼

Publication Type

Journal Articles	33
Reports - Research	28
Tests/Questionnaires	7
Reports - Descriptive	3
Information Analyses	2
Reports - Evaluative	2
Dissertations/Theses -…	1
Speeches/Meeting Papers	1

Education Level

Higher Education	11
Postsecondary Education	10
Secondary Education	3
Elementary Education	2
High Schools	2
Early Childhood Education	1
Elementary Secondary Education	1
Kindergarten	1
Preschool Education	1
Primary Education	1

Audience

Researchers

Location

China	5
Turkey	4
Iran	3
Japan	3
Germany	2
Hong Kong	2
India	2
South Korea	2
Arizona	1
Colombia	1
Europe	1
Iran (Tehran)	1
Japan (Tokyo)	1
Jordan	1
Mexico	1
Switzerland	1
Taiwan	1
United States	1
Vietnam	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	11
International English…	4
Peabody Picture Vocabulary…	2
Expressive One Word Picture…	1
Graduate Record Examinations	1
Mean Length of Utterance	1
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 36 results Save | Export

Comparison of Traditional Machine Learning and Neural Network Approaches for Automated Scoring of Second Language English Essays

Peer reviewed

Direct link

Erik Voss – Language Testing, 2025

An increasing number of language testing companies are developing and deploying deep learning-based automated essay scoring systems (AES) to replace traditional approaches that rely on handcrafted feature extraction. However, there is hesitation to accept neural network approaches to automated essay scoring because the features are automatically…

Descriptors: Artificial Intelligence, Automation, Scoring, English (Second Language)

Artificial Intelligence in International English Language Testing System Writing Assessments: A Comparative Study of Human Ratings and DeepAI

Peer reviewed
PDF on ERIC

Download full text

Somayeh Fathali; Fatemeh Mohajeri – Technology in Language Teaching & Learning, 2025

The International English Language Testing System (IELTS) is a high-stakes exam where Writing Task 2 significantly influences the overall scores, requiring reliable evaluation. While trained human raters perform this task, concerns about subjectivity and inconsistency have led to growing interest in artificial intelligence (AI)-based assessment…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Artificial Intelligence

ChatGPT4o as an AI Peer Assessor in EFL Speaking Classrooms: Examining Scoring Reliability and Feedback Effectiveness

Peer reviewed

Direct link

Junfei Li; Jinyan Huang; Thomas Sheeran – SAGE Open, 2025

This study investigated the role of ChatGPT4o as an AI peer assessor in English-as-a-foreign-language (EFL) speaking classrooms, with a focus on its scoring reliability and the effectiveness of its feedback. The research involved 40 first-year English major students from two parallel classes at a Chinese university. Twenty from one class served as…

Descriptors: Artificial Intelligence, Technology Uses in Education, Peer Evaluation, English (Second Language)

The Rashomon Effect: Which Features of a Speaker's Talk Do Listeners Notice?

Peer reviewed

Direct link

Seedhouse, Paul; Satar, Müge – Classroom Discourse, 2023

The same L2 speaking performance may be analysed and evaluated in very different ways by different teachers or raters. We present a new, technology-assisted research design which opens up to investigation the trajectories of convergence and divergence between raters. We tracked and recorded what different raters noticed when, whilst grading a…

Descriptors: Language Tests, English (Second Language), Second Language Learning, Oral Language

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

A Rasch Analysis of Rater Behaviour in Speaking Assessment

Peer reviewed
PDF on ERIC

Download full text

Polat, Murat – International Online Journal of Education and Teaching, 2020

The assessment of speaking skills in foreign language testing has always had some pros (testing learners' speaking skills doubles the validity of any language test) and cons (many testrelevant/irrelevant variables interfere) since it is a multi-dimensional process. In the meantime, exploring grader behaviours while scoring learners' speaking…

Descriptors: Item Response Theory, Interrater Reliability, Speech Skills, Second Language Learning

Applying Generalizability Theory in Language Testing: Comparing Nested and Crossed Scoring Designs in the Assessment of Speaking Skills

Peer reviewed
PDF on ERIC

Download full text

Polat, Murat; Turhan, Nihan Sölpük – International Journal of Curriculum and Instruction, 2021

Scoring language learners' speaking skills is open to a number of measurement errors since raters' personal judgements could involve in the process. Different grading designs in which raters score a student's whole speaking skills or a specific dimension of the speaking performance could be settled to control and minimize the amount of the error…

Descriptors: Language Tests, Scoring, Speech Communication, State Universities

Investigation of Interrater Reliability in the Evaluation of Foreign Language Writing Skills with Multigroup Confirmatory Factor Analysis

Peer reviewed
PDF on ERIC

Download full text

Önen, Emine; Yayvak, Melike Kübra Tasdelen – Journal of Education and Training Studies, 2019

In this study, it was aimed to examine the interrater reliability of the scoring of paragraph writing skills on foreign languages with the measurement invariance tests. The study group consists of 267 students studying English at the Preparatory School at Gazi University. In the study, where students write a paragraph on the same topic, the…

Descriptors: Second Language Learning, Second Language Instruction, Factor Analysis, English (Second Language)

The Processes of Rating L2 Speaking Performance Using an Analytic Rating Scale -- A Qualitative Exploration

Peer reviewed
PDF on ERIC

Download full text

Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022

In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…

Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction

The Use of Semantic Similarity Tools in Automated Content Scoring of Fact-Based Essays Written by EFL Learners

Peer reviewed

Direct link

Wang, Qiao – Education and Information Technologies, 2022

This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring

The Impact of Computers on Marking Behaviors and Assessment: A Many-Facet Rasch Measurement Analysis of Essays by EFL College Students

Peer reviewed

Direct link

He, Tung-hsien – SAGE Open, 2019

This study employed a mixed-design approach and the Many-Facet Rasch Measurement (MFRM) framework to investigate whether rater bias occurred between the onscreen scoring (OSS) mode and the paper-based scoring (PBS) mode. Nine human raters analytically marked scanned scripts and paper scripts using a six-category (i.e., six-criterion) rating…

Descriptors: Computer Assisted Testing, Scoring, Item Response Theory, Essays

Rater Dominance in Discussion as a Resolution Method

Peer reviewed
PDF on ERIC

Download full text

Ahmadi, Alireza – Taiwan Journal of TESOL, 2020

Rater subjectivity has long been an intriguing topic. The use of discussion as a resolution method is a practical way to reduce this subjectivity. However, the efficacy of discussion depends on whether different raters get equally engaged in it or one rater tends to dominate others. This study investigated whether and how rater dominance occurs in…

Descriptors: Evaluators, Interrater Reliability, Discussion, Discourse Analysis

Developing an Innovative Elicited Imitation Task for Efficient English Proficiency Assessment. TOEFL® Research Report. RR-96. ETS RR-21-24

Peer reviewed
PDF on ERIC

Download full text

Davis, Larry; Norris, John – ETS Research Report Series, 2021

The elicited imitation task (EIT), in which language learners listen to a series of spoken sentences and repeat each one verbatim, is a commonly used measure of language proficiency in second language acquisition research. The "TOEFL® Essentials"™ test includes an EIT as a holistic measure of speaking proficiency, referred to as the…

Descriptors: Task Analysis, Language Proficiency, Speech Communication, Language Tests

Automated Essay Scoring at Scale: A Case Study in Switzerland and Germany. TOEFL® Research Report. RR-86. ETS RR-19-12

Peer reviewed
PDF on ERIC

Download full text

Rupp, André A.; Casabianca, Jodi M.; Krüger, Maleika; Keller, Stefan; Köller, Olaf – ETS Research Report Series, 2019

In this research report, we describe the design and empirical findings for a large-scale study of essay writing ability with approximately 2,500 high school students in Germany and Switzerland on the basis of 2 tasks with 2 associated prompts, each from a standardized writing assessment whose scoring involved both human and automated components.…

Descriptors: Automation, Foreign Countries, English (Second Language), Language Tests

For a Greater Good: Bias Analysis in Writing Assessment

Peer reviewed

Direct link

Ahmadi Shirazi, Masoumeh – SAGE Open, 2019

Threats to construct validity should be reduced to a minimum. If true, sources of bias, namely raters, items, tests as well as gender, age, race, language background, culture, and socio-economic status need to be spotted and removed. This study investigates raters' experience, language background, and the choice of essay prompt as potential…

Descriptors: Foreign Countries, Language Tests, Test Bias, Essay Tests

Previous Page | Next Page »

Pages: 1 | 2 | 3

ETS Research Report Series	6
Language Testing	3
SAGE Open	3
Modern Language Journal	2
Assessing Writing	1
Classroom Discourse	1
Education and Information…	1
Educational and Psychological…	1
English Language Teaching	1
English Teaching	1
IEEE Transactions on Learning…	1
International Journal of…	1
International Online Journal…	1
JALT CALL Journal	1
Journal of Education and…	1
Journal of Pan-Pacific…	1
Journal of Speech, Language,…	1
Language Assessment Quarterly	1
Language Education &…	1
ProQuest LLC	1
ReCALL	1
Taiwan Journal of TESOL	1
Technology in Language…	1
Working Papers in TESOL &…	1
More ▼

Polat, Murat	2
Ahmadi Shirazi, Masoumeh	1
Ahmadi, Alireza	1
Alt, Mary	1
Bejar, Isaac I.	1
Breyer, F. Jay	1
Buzick, Heather	1
Carey, Michael D.	1
Carlson, Sybil B.	1
Casabianca, Jodi M.	1
Chan, Stephanie W. Y.	1
Cheung, Wai Ming	1
Coniam, David	1
Davis, Larry	1
Dunn, Peter K.	1
Edwards, Alison L.	1
Erik Voss	1
Fatemeh Mohajeri	1
Ferroli, Lou	1
Figueroa, Cecilia	1
Gebril, Atta	1
Han, Qie	1
He, Tung-hsien	1
Heidari, Jamshid	1
More ▼