ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	3
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	14

Descriptor

Computer Software	16
Correlation	16
Interrater Reliability	16
Foreign Countries	8
Comparative Analysis	7
Scoring	7
Computer Assisted Testing	6
Educational Technology	6
Evaluators	6
Second Language Learning	6
English (Second Language)	5
Evaluation Methods	5
Models	5
Scores	5
Second Language Instruction	4
Undergraduate Students	4
Writing Evaluation	4
Artificial Intelligence	3
Essays	3
Evaluation Criteria	3
Higher Education	3
Language Tests	3
Measurement Techniques	3
Native Speakers	3
Secondary School Students	3
More ▼

Source

Advances in Physiology…	1
Australasian Journal of…	1
Computers & Education	1
Creativity Research Journal	1
ETS Research Report Series	1
Education and Information…	1
International Association for…	1
International Journal of…	1
Journal of Baltic Science…	1
Journal of Educational…	1
Journal of Speech, Language,…	1
Language Testing	1
ProQuest LLC	1
ReCALL	1
Research Quarterly for…	1
More ▼

Publication Type

Journal Articles	13
Reports - Research	11
Tests/Questionnaires	3
Reports - Evaluative	2
Collected Works - Proceedings	1
Dissertations/Theses -…	1
Reports - Descriptive	1

Education Level

Higher Education	5
Postsecondary Education	5
Secondary Education	4
Elementary Secondary Education	3
Elementary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Middle Schools	1

Audience

Researchers

Location

Japan	2
Singapore	2
Asia	1
Australia	1
Brazil	1
Canada	1
China	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
Germany	1
Greece	1
Hawaii	1
Hong Kong	1
Ireland	1
Israel	1
Italy	1
Kazakhstan	1
Netherlands	1
Norway	1
Ohio	1
Pakistan	1
Pennsylvania	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Perceptual and Acoustic Assessment of Strain Using Synthetically Modified Voice Samples

Peer reviewed

Direct link

Park, Yeonggwang; Cádiz, Manuel Díaz; Nagle, Kathleen F.; Stepp, Cara E. – Journal of Speech, Language, and Hearing Research, 2020

Purpose: Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Method: Stimuli were created using recordings of…

Descriptors: Acoustics, Audio Equipment, Auditory Perception, Correlation

The Use of Semantic Similarity Tools in Automated Content Scoring of Fact-Based Essays Written by EFL Learners

Peer reviewed

Direct link

Wang, Qiao – Education and Information Technologies, 2022

This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Assessing Visual-Spatial Creativity in Youth on the Autism Spectrum

Peer reviewed

Direct link

Diener, Marissa L.; Wright, Cheryl A.; Smith, Katherine N.; Wright, Scott D. – Creativity Research Journal, 2014

The goal of this study was to develop a measure of creativity that builds on the strengths of youth with autism spectrum disorders (ASD). The assessment of creativity focused on the visual-spatial abilities of these youth using 3D modeling software. One of the objectives of the research was to develop a measure of creativity in an authentic…

Descriptors: Autism, Pervasive Developmental Disorders, Creativity, Creativity Tests

SLA Developmental Stages and Teachers' Assessment of Written French: Exploring Direkt Profil as a Diagnostic Assessment Tool

Peer reviewed

Direct link

Granfeldt, Jonas; Ågren, Malin – Language Testing, 2014

One core area of research in Second Language Acquisition is the identification and definition of developmental stages in different L2s. For L2 French, Bartning and Schlyter (2004) presented a model of six morphosyntactic stages of development in the shape of grammatical profiles. The model formed the basis for the computer program Direkt Profil…

Descriptors: Second Language Learning, Language Tests, French, Language Teachers

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Marking Student Programs Using Graph Similarity

Peer reviewed

Direct link

Naude, Kevin A.; Greyling, Jean H.; Vogts, Dieter – Computers & Education, 2010

We present a novel approach to the automated marking of student programming assignments. Our technique quantifies the structural similarity between unmarked student submissions and marked solutions, and is the basis by which we assign marks. This is accomplished through an efficient novel graph similarity measure ("AssignSim"). Our experiments…

Descriptors: Grading, Assignments, Correlation, Interrater Reliability

Experimenting with a Computer Essay-Scoring Program Based on ESL Student Writing Scripts

Peer reviewed

Direct link

Coniam, David – ReCALL, 2009

This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…

Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability

A Multi-Component Model for Assessing Learning Objects: The Learning Object Evaluation Metric (LOEM)

Peer reviewed

Direct link

Kay, Robin H.; Knaack, Liesel – Australasian Journal of Educational Technology, 2008

While discussion of the criteria needed to assess learning objects has been extensive, a formal, systematic model for evaluation has yet to be thoroughly tested. The purpose of the following study was to develop and assess a multi-component model for evaluating learning objects. The Learning Object Evaluation Metric (LOEM) was developed from a…

Descriptors: Foreign Countries, Models, Measurement Techniques, Evaluation Criteria

Automatic Coding of Dialogue Acts in Collaboration Protocols

Peer reviewed

Direct link

Erkens, Gijsbert; Janssen, Jeroen – International Journal of Computer-Supported Collaborative Learning, 2008

Although protocol analysis can be an important tool for researchers to investigate the process of collaboration and communication, the use of this method of analysis can be time consuming. Hence, an automatic coding procedure for coding dialogue acts was developed. This procedure helps to determine the communicative function of messages in online…

Descriptors: Protocol Analysis, Validity, Cooperation, Coding

Using Momentary Time Sampling to Estimate Minutes of Physical Activity in Physical Education: Validation of Scores for the System for Observing Fitness Instruction Time

Peer reviewed

Direct link

Heath, Edward M.; Coleman, Karen J.; Lensegrav, Tera L.; Fallon, Jennifer A. – Research Quarterly for Exercise and Sport, 2006

The System for Observing Fitness Instruction Time (SOFIT) is a direct observation system specifically developed for use during physical education (PE; McKenzie, 1991; McKenzie, Sallis, & Nader, 1991). The purpose of this study was to validate the estimates of time spent in various physical activity intensities obtained with the paper and pencil…

Descriptors: Validity, Physical Activities, Physical Education, Physical Fitness

A Computer-Based Approach for Deriving and Measuring Individual and Team Knowledge Structure from Essay Questions

Peer reviewed

Direct link

Clariana, Roy B.; Wallace, Patricia – Journal of Educational Computing Research, 2007

This proof-of-concept investigation describes a computer-based approach for deriving the knowledge structure of individuals and of groups from their written essays, and considers the convergent criterion-related validity of the computer-based scores relative to human rater essay scores and multiple-choice test scores. After completing a…

Descriptors: Computer Assisted Testing, Multiple Choice Tests, Construct Validity, Cognitive Structures

Relationship of Admission Test Scores to Writing Performance of Native and Nonnative Speakers of English.

Download full text

Carlson, Sybil B.; And Others – 1985

Four writing samples were obtained from 638 foreign college applicants who represented three major foreign language groups (Arabic, Chinese, and Spanish), and from 60 native English speakers. All four were scored holistically, two were also scored for sentence-level and discourse-level skills, and some were scored by the Writer's Workbench…

Descriptors: Arabic, Chinese, College Entrance Examinations, Computer Software

Previous Page | Next Page »

Pages: 1 | 2

Amanda Huee-Ping Wong	1
Breyer, F. Jay	1
Carlson, Sybil B.	1
Clariana, Roy B.	1
Coleman, Karen J.	1
Coniam, David	1
Cádiz, Manuel Díaz	1
Diener, Marissa L.	1
Erkens, Gijsbert	1
Fallon, Jennifer A.	1
Granfeldt, Jonas	1
Greyling, Jean H.	1
Guangtian Zhu	1
Heath, Edward M.	1
Ivan Cherh Chiet Low	1
Janssen, Jeroen	1
Jianwen Xiong	1
Kay, Robin H.	1
Knaack, Liesel	1
Lensegrav, Tera L.	1
Lin Liu	1
Lorenz, Florian	1
Nagle, Kathleen F.	1
Nathasha Vihangi Luke	1
More ▼