ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	6
Since 2017 (last 10 years)	8
Since 2007 (last 20 years)	12

Descriptor

Computer Software	15
Interrater Reliability	15
Writing Evaluation	15
Essays	9
Computer Assisted Testing	6
Scoring	6
Second Language Learning	6
Writing Instruction	6
Artificial Intelligence	5
Comparative Analysis	5
English (Second Language)	5
Foreign Countries	5
Correlation	4
Evaluators	4
Feedback (Response)	4
Second Language Instruction	4
Undergraduate Students	4
Accuracy	3
Computer Software Evaluation	3
Evaluation Criteria	3
Evaluation Methods	3
Scores	3
Scoring Rubrics	3
Statistical Analysis	3
Chinese	2
More ▼

Source

Education and Information…	2
Assessing Writing	1
Educational Technology &…	1
International Educational…	1
International Journal of…	1
Journal of Baltic Science…	1
Journal of Educational…	1
Journal of Experimental…	1
Journal of Interactive…	1
ProQuest LLC	1
Research-publishing.net	1
SAGE Open	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	9
Reports - Evaluative	3
Tests/Questionnaires	3
Books	1
Collected Works - General	1
Dissertations/Theses -…	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Higher Education	7
Postsecondary Education	7
Elementary Secondary Education	2
Secondary Education	2
Middle Schools	1

Audience

Practitioners	1
Researchers	1
Teachers	1

Location

China	1
Germany	1
Japan	1
Saudi Arabia	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
Graduate Record Examinations	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Revolutionising Essay Evaluation: A Cutting-Edge Rubric for AI-Assisted Writing

Peer reviewed

Direct link

Hassan Saleh Mahdi; Ahmed Alkhateeb – International Journal of Computer-Assisted Language Learning and Teaching, 2025

This study aims to develop a robust rubric for evaluating artificial intelligence (AI)--assisted essay writing in English as a Foreign Language (EFL) contexts. Employing a modified Delphi technique, we conducted a comprehensive literature review and administered Likert scale questionnaires. This process yielded nine key evaluation criteria,…

Descriptors: Scoring Rubrics, Essays, Writing Evaluation, Artificial Intelligence

Impacts of ChatGPT-Assisted Writing for EFL English Majors: Feasibility and Challenges

Peer reviewed

Direct link

Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024

To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…

Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)

Comparative Judgement: Assess Student Production without Absolute Judgements

Peer reviewed
PDF on ERIC

Download full text

Sumner, Josh – Research-publishing.net, 2021

Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…

Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making

The Use of Semantic Similarity Tools in Automated Content Scoring of Fact-Based Essays Written by EFL Learners

Peer reviewed

Direct link

Wang, Qiao – Education and Information Technologies, 2022

This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring

A Human-Centric Automated Essay Scoring and Feedback System for the Development of Ethical Reasoning

Peer reviewed

Direct link

Lee, Alwyn Vwen Yen; Luco, Andrés Carlos; Tan, Seng Chee – Educational Technology & Society, 2023

Although artificial Intelligence (AI) is prevalent and impacts facets of daily life, there is limited research on responsible and humanistic design, implementation, and evaluation of AI, especially in the field of education. Afterall, learning is inherently a social endeavor involving human interactions, rendering the need for AI designs to be…

Descriptors: Essays, Scoring, Writing Evaluation, Computer Software

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Development of a Rubric to Assess Academic Writing Incorporating Plagiarism Detectors

Peer reviewed

Direct link

Razi, Salim – SAGE Open, 2015

Similarity reports of plagiarism detectors should be approached with caution as they may not be sufficient to support allegations of plagiarism. This study developed a 50-item rubric to simplify and standardize evaluation of academic papers. In the spring semester of 2011-2012 academic year, 161 freshmen's papers at the English Language Teaching…

Descriptors: Foreign Countries, Scoring Rubrics, Writing Evaluation, Writing (Composition)

On the Reliability and Validity of Human and LSA-Based Evaluations of Complex Student-Authored Texts

Peer reviewed

Direct link

Seifried, Eva; Lenhard, Wolfgang; Baier, Herbert; Spinath, Birgit – Journal of Educational Computing Research, 2012

This study investigates the potential of a software tool based on Latent Semantic Analysis (LSA; Landauer, McNamara, Dennis, & Kintsch, 2007) to automatically evaluate complex German texts. A sample of N = 94 German university students provided written answers to questions that involved a high amount of analytical reasoning and evaluation.…

Descriptors: Foreign Countries, Computer Software, Computer Software Evaluation, Computer Uses in Education

Can Machine Scoring Deal with Broad and Open Writing Tests as Well as Human Readers?

Peer reviewed

Direct link

McCurry, Doug – Assessing Writing, 2010

This article considers the claim that machine scoring of writing test responses agrees with human readers as much as humans agree with other humans. These claims about the reliability of machine scoring of writing are usually based on specific and constrained writing tasks, and there is reason for asking whether machine scoring of writing requires…

Descriptors: Writing Tests, Scoring, Interrater Reliability, Computer Assisted Testing

Automated Formative Assessment as a Tool to Scaffold Student Documentary Writing

Peer reviewed

Direct link

Ferster, Bill; Hammond, Thomas C.; Alexander, R. Curby; Lyman, Hunt – Journal of Interactive Learning Research, 2012

The hurried pace of the modern classroom does not permit formative feedback on writing assignments at the frequency or quality recommended by the research literature. One solution for increasing individual feedback to students is to incorporate some form of computer-generated assessment. This study explores the use of automated assessment of…

Descriptors: Feedback (Response), Scripts, Formative Evaluation, Essays

Computer Grading of Student Prose, Using Modern Concepts and Software.

Peer reviewed

Page, Ellis Batten – Journal of Experimental Education, 1994

National Assessment of Educational Progress writing sample essays from 1988 and 1990 (495 and 599 essays) were subjected to computerized grading and human ratings. Cross-validation suggests that computer scoring is superior to a two-judge panel, a finding encouraging for large programs of essay evaluation. (SLD)

Descriptors: Computer Assisted Testing, Computer Software, Essays, Evaluation Methods

Relationship of Admission Test Scores to Writing Performance of Native and Nonnative Speakers of English.

Download full text

Carlson, Sybil B.; And Others – 1985

Four writing samples were obtained from 638 foreign college applicants who represented three major foreign language groups (Arabic, Chinese, and Spanish), and from 60 native English speakers. All four were scored holistically, two were also scored for sentence-level and discourse-level skills, and some were scored by the Writer's Workbench…

Descriptors: Arabic, Chinese, College Entrance Examinations, Computer Software

Technology and Language Testing. A Collection of Papers from the Annual Colloquium on Language Testing Research (7th, Princeton, New Jersey, April 6-9, 1985).

Stansfield, Charles W., Ed. – 1986

This collection of essays on measurement theory and language testing includes: "Computerized Adaptive Testing: Implications for Language Test Developers" (Peter Tung); "The Promise and Threat of Computerized Adaptive Assessment of Reading Comprehension" (Michael Canale); "Computerized Rasch Analysis of Item Bias in ESL…

Descriptors: Chinese, Cloze Procedure, Computer Assisted Testing, Computer Software

Ahmed Alkhateeb	1
Alexander, R. Curby	1
Baier, Herbert	1
Carlson, Sybil B.	1
Chung-You Tsai	1
Doewes, Afrizal	1
Ferster, Bill	1
Guangtian Zhu	1
Hammond, Thomas C.	1
Hassan Saleh Mahdi	1
Iain Kelsall Brown	1
Jianwen Xiong	1
Kurdhi, Nughthoh Arfawi	1
Lee, Alwyn Vwen Yen	1
Lenhard, Wolfgang	1
Lin Liu	1
Luco, Andrés Carlos	1
Lyman, Hunt	1
McCurry, Doug	1
Page, Ellis Batten	1
Razi, Salim	1
Saxena, Akrati	1
Seifried, Eva	1
Spinath, Birgit	1
More ▼