ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	23
Since 2017 (last 10 years)	46
Since 2007 (last 20 years)	84

Descriptor

Reliability	113
Writing Evaluation	113
Validity	45
Foreign Countries	32
Scoring	30
Essays	28
Scoring Rubrics	26
English (Second Language)	25
Scores	25
Student Evaluation	24
Comparative Analysis	23
Second Language Learning	23
Evaluation Methods	21
Writing Skills	19
Evaluators	18
Writing Instruction	18
Elementary School Students	17
Higher Education	14
Second Language Instruction	14
Writing Tests	14
Correlation	13
Feedback (Response)	12
Writing Research	12
Computer Software	11
Writing (Composition)	11
More ▼

Publication Type

Journal Articles	98
Reports - Research	85
Reports - Evaluative	11
Reports - Descriptive	10
Opinion Papers	7
Speeches/Meeting Papers	6
Tests/Questionnaires	5
Information Analyses	3
Dissertations/Theses -…	2
Collected Works - General	1

Education Level

Higher Education	27
Postsecondary Education	26
Elementary Education	19
Secondary Education	15
Elementary Secondary Education	9
Grade 3	7
Middle Schools	6
Early Childhood Education	5
Grade 4	5
Grade 5	5
High Schools	5
Primary Education	5
Intermediate Grades	4
Grade 7	3
Grade 8	3
Junior High Schools	3
Grade 1	2
Grade 10	2
Grade 2	2
Grade 6	2
Adult Education	1
Grade 12	1
Grade 9	1
Kindergarten	1
More ▼

Audience

Practitioners

Location

Turkey	6
United Kingdom (England)	5
Australia	3
China	3
Hong Kong	2
Spain	2
Alabama	1
Austria	1
California	1
Canada	1
Egypt	1
Indonesia	1
Iowa	1
Iran	1
Jordan	1
Louisiana	1
Massachusetts	1
Netherlands	1
Nigeria	1
Norway	1
Philippines	1
Singapore	1
South Korea	1
Spain (Barcelona)	1
Texas (Houston)	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

International English…	2
Flesch Kincaid Grade Level…	1
National Assessment Program…	1
Stanford Achievement Tests	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 113 results Save | Export

Mixed Model Generalizability Theory: A Case Study and Tutorial

Peer reviewed
PDF on ERIC

Download full text

Alan Huebner; Gustaf B. Skar; Mengchen Huang – Practical Assessment, Research & Evaluation, 2025

Generalizability theory is a modern and powerful framework for conducting reliability analyses. It is flexible to accommodate both random and fixed facets. However, there has been a relative scarcity in the practical literature on how to handle the fixed facet case. This article aims to provide practitioners a conceptual understanding and…

Descriptors: Generalizability Theory, Multivariate Analysis, Statistical Analysis, Writing Evaluation

AI-Assisted Marking: Functionality and Limitations of ChatGPT in Written Assessment Evaluation

Peer reviewed

Direct link

Joan Li; Nikhil Kumar Jangamreddy; Ryuto Hisamoto; Ruchita Bhansali; Amalie Dyda; Luke Zaphir; Mashhuda Glencross – Australasian Journal of Educational Technology, 2024

Generative artificial intelligence technologies, such as ChatGPT, bring an unprecedented change in education by leveraging the power of natural language processing and machine learning. Employing ChatGPT to assist with marking written assessment presents multiple advantages including scalability, improved consistency, eliminating biases associated…

Descriptors: Higher Education, Artificial Intelligence, Grading, Scoring Rubrics

Benchmark Rating Procedure, Best of Both Worlds? Comparing Procedures to Rate Text Quality in a Reliable and Valid Manner

Peer reviewed

Direct link

Bouwer, Renske; Koster, Monica; van den Bergh, Huub – Assessment in Education: Principles, Policy & Practice, 2023

Assessing students' writing performance is essential to adequately monitor and promote individual writing development, but it is also a challenge. The present research investigates a benchmark rating procedure for assessing texts written by upper-elementary students. In two studies we examined whether a benchmark rating procedure (1) leads to…

Descriptors: Benchmarking, Writing Evaluation, Evaluation Methods, Elementary School Students

Scoring Rubric Reliability and Internal Validity in Rater-Mediated EFL Writing Assessment: Insights from Many-Facet Rasch Measurement

Peer reviewed

Direct link

Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022

Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…

Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods

Coherence-Based Automatic Short Answer Scoring Using Sentence Embedding

Peer reviewed

Direct link

Dadi Ramesh; Suresh Kumar Sanampudi – European Journal of Education, 2024

Automatic essay scoring (AES) is an essential educational application in natural language processing. This automated process will alleviate the burden by increasing the reliability and consistency of the assessment. With the advances in text embedding libraries and neural network models, AES systems achieved good results in terms of accuracy.…

Descriptors: Scoring, Essays, Writing Evaluation, Memory

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

The Classification Accuracy and Consistency of Comparative Judgement of Writing Compared to Rubric-Based Teacher Assessment

Peer reviewed

Direct link

Pinot de Moira, Anne; Wheadon, Christopher; Christodoulou, Daisy – Research in Education, 2022

Writing is generally assessed internationally using rubric-based approaches, but there is a growing body of evidence to suggest that the reliability of such approaches is poor. In contrast, comparative judgement studies suggest that it is possible to assess open ended tasks such as writing with greater reliability. Many previous studies, however,…

Descriptors: Writing Evaluation, Classification, Accuracy, Scoring Rubrics

Investigating the Quality of a High-Stakes EFL Writing Assessment Procedure in the Turkish Higher Education Context

Peer reviewed
PDF on ERIC

Download full text

Elif Sari – International Journal of Assessment Tools in Education, 2024

Employing G-theory and rater interviews, the study investigated how a high-stakes writing assessment procedure (i.e., a single-task, single-rater, and holistic scoring procedure) impacted the variability and reliability of its scores within the Turkish higher education context. Thirty-two essays written on two different writing tasks (i.e.,…

Descriptors: Foreign Countries, High Stakes Tests, Writing Evaluation, Scores

Modeling Writing Traits in a Formative Essay Corpus. Research Report. ETS RR-24-02

Peer reviewed
PDF on ERIC

Download full text

Paul Deane; Duanli Yan; Katherine Castellano; Yigal Attali; Michelle Lamar; Mo Zhang; Ian Blood; James V. Bruno; Chen Li; Wenju Cui; Chunyi Ruan; Colleen Appel; Kofi James; Rodolfo Long; Farah Qureshi – ETS Research Report Series, 2024

This paper presents a multidimensional model of variation in writing quality, register, and genre in student essays, trained and tested via confirmatory factor analysis of 1.37 million essay submissions to ETS' digital writing service, Criterion®. The model was also validated with several other corpora, which indicated that it provides a…

Descriptors: Writing (Composition), Essays, Models, Elementary School Students

Automated Writing Evaluation in EFL Contexts: A Review of Effectiveness, Impact, and Pedagogical Implications

Peer reviewed

Direct link

Tahani I. Aldosemani; Hussein Assalahi; Areej Lhothali; Maram Albsisi – International Journal of Computer-Assisted Language Learning and Teaching, 2023

This paper explores the literature on AWE feedback, particularly its perceived impact on enhancing EFL student writing proficiency. Prior research highlighted the contribution of AWE in fostering learner autonomy and alleviating teacher workloads, with a substantial focus on student engagement with AWE feedback. This review strives to illuminate…

Descriptors: Automation, Student Evaluation, Writing Evaluation, English (Second Language)

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Technical Features of Slopes for Curriculum-Based Measures of Secondary Writing

Peer reviewed

Direct link

Romig, John Elwood; Olsen, Amanda A. – Reading & Writing Quarterly, 2021

Compared to other content areas, there is a dearth of research examining curriculum-based measurement of writing (CBM-W). This study conducted a conceptual replication examining the reliability, stability, and sensitivity to growth of slopes produced from CBM-W. Eighty-nine (N = 89) eighth-grade students responded to one CBM-W probe weekly for 11…

Descriptors: Curriculum Based Assessment, Writing Evaluation, Middle School Students, Grade 8

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

The Different Impact of a Structured Peer-Assessment Task in Relation to University Undergraduates' Initial Writing Skills

Peer reviewed

Direct link

Ramon-Casas, Marta; Nuño, Neus; Pons, Ferran; Cunillera, Toni – Assessment & Evaluation in Higher Education, 2019

This article presents an empirical evaluation of the validity and reliability of a peer-assessment activity to improve academic writing competences. Specifically, we explored a large group of psychology undergraduate students with different initial writing skills. Participants (n = 365) produced two different essays, which were evaluated by their…

Descriptors: Peer Evaluation, Validity, Reliability, Writing Skills

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Assessing Writing	4
Grantee Submission	4
Assessment & Evaluation in…	3
Assessment in Education:…	3
Journal of Technical Writing…	3
Journal of Technology,…	3
Language Assessment Quarterly	3
Language Testing	3
Practical Assessment,…	3
Reading and Writing: An…	3
Applied Linguistics	2
Educational and Psychological…	2
International Journal of…	2
International Journal of…	2
International Journal of…	2
Online Submission	2
ProQuest LLC	2
School Psychology Review	2
WPA: Writing Program…	2
AERA Online Paper Repository	1
Alberta Journal of…	1
American Educational Research…	1
Applied Measurement in…	1
Assessment and Evaluation in…	1
Australasian Journal of…	1
More ▼

Hayes, John R.	3
Al Otaiba, Stephanie	2
Attali, Yigal	2
Christodoulou, Daisy	2
Elliot, Norbert	2
Gatlin, Brandy	2
Han, Turgay	2
Hebert, Michael	2
Kantor, Robert	2
Kim, Young-Suk Grace	2
Lee, Yong-Won	2
Li, Wentao	2
Rezaei, Ali Reza	2
Schatschneider, Christopher	2
Wanzek, Jeanne	2
Wheadon, Christopher	2
Abdel-Haq, Eman Muhammad	1
Ahmed, Yusra	1
Al-Sayed, Rania Kamal Muhammad	1
Al-Shbeil, Abeer	1
Alan Huebner	1
Ali, Mahsoub Abdel-Sadeq	1
Amalie Dyda	1
Andersen, Richard	1
More ▼