ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	5
Since 2017 (last 10 years)	13
Since 2007 (last 20 years)	23

Descriptor

Accuracy	23
Comparative Analysis	23
Interrater Reliability	23
Foreign Countries	10
Correlation	8
Evaluators	7
Statistical Analysis	7
Scores	6
Writing Evaluation	6
Elementary School Students	5
Essays	4
Prediction	4
Second Language Instruction	4
Second Language Learning	4
Artificial Intelligence	3
Computational Linguistics	3
Computer Software	3
Decision Making	3
English (Second Language)	3
Pretests Posttests	3
Reading Fluency	3
Reliability	3
Scoring	3
Student Evaluation	3
Undergraduate Students	3
More ▼

Publication Type

Journal Articles	21
Reports - Research	21
Tests/Questionnaires	5
Reports - Descriptive	1
Reports - Evaluative	1

Education Level

Higher Education	7
Postsecondary Education	6
Elementary Education	4
Grade 1	2
Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Secondary Education	1

Audience

Location

Iran	3
China	2
Australia	1
Florida	1
Israel	1
Jordan	1
Philippines	1
Singapore	1

Laws, Policies, & Programs

Assessments and Surveys

Woodcock Johnson Tests of…	2
National Assessment of…	1
Wechsler Individual…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

An Interview with ChatGPT on Emergency Remote Teaching: A Comparative Analysis Based on Human-AI Collaboration

Peer reviewed
PDF on ERIC

Download full text

Tülübas, Tijen; Demirkol, Murat; Ozdemir, Tuncay Yavuz; Polat, Hakan; Karakose, Turgut; Yirci, Ramazan – Educational Process: International Journal, 2023

Background/purpose: ChatGPT, a recent form of AI-based language model, have garnered interest among people from diverse backgrounds with its immersive capabilities. Using ChatGPT to support or generate scientific research has also created an ongoing debate over its advantages versus risks. The present study aimed to conduct an AI-enabled research…

Descriptors: Artificial Intelligence, Emergency Programs, Distance Education, COVID-19

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Examining the Interrater Reliability between Self- and Teacher Assessment of Students' Oral Performances

Peer reviewed
PDF on ERIC

Download full text

Manzano, Dexter L. – International Journal of Language Testing, 2022

The increasing popularity of self-assessment prompted several scholars to investigate its effectiveness and accuracy in relation to teacher assessment. However, most of these studies focused only on the consistency estimate perspective. Thus, the current study investigated the interrater reliability between self- and teacher assessment of…

Descriptors: Oral Language, Self Evaluation (Individuals), College Students, Interrater Reliability

Comparing Machine and Human Reviewers to Evaluate the Risk of Bias in Randomized Controlled Trials

Peer reviewed

Direct link

Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020

Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…

Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

Metrics for Discrete Student Models: Chance Levels, Comparisons, and Use Cases

Peer reviewed
PDF on ERIC

Download full text

Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018

Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…

Descriptors: Models, Comparative Analysis, Prediction, Probability

Estimating Hazard Ratios from Published Kaplan-Meier Survival Curves: A Methods Validation Study

Peer reviewed

Direct link

Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019

Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…

Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials

Exploration of New Complexity Metrics for Curriculum-Based Measures of Writing

Peer reviewed
PDF on ERIC

Download full text

Direct link

Wagner, Kyle; Smith, Alex; Allen, Abigail; McMaster, Kristen; Poch, Apryl; Lembke, Erica – Assessment for Effective Intervention, 2019

Researchers and practitioners have questioned whether scoring procedures used with curriculum-based measures of writing (CBM-W) capture growth in complexity of writing. We analyzed data from six independent samples to examine two potential scoring metrics for picture word CBM-W (PW), a sentence-level CBM task. Correct word sequences per response…

Descriptors: Curriculum Based Assessment, Writing Evaluation, Comparative Analysis, Scoring

Using Subjective and Objective Measures to Predict Level of Reading Fluency at the End of First Grade

Peer reviewed

Direct link

Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018

This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…

Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students

A Comparison between Students' Self-Assessment and Teachers' Assessment

Peer reviewed
PDF on ERIC

Download full text

Thawabieh, Ahmad M. – Journal of Curriculum and Teaching, 2017

This study aimed to compare between the students' self-assessment and teachers' assessment. The study sample consisted of 71 students at Tafila Technical University studying Introduction to Psychology course. The researcher used 2 students' self-assessment tools and 2 tests. The results indicated that students can assess themselves accurately if…

Descriptors: Comparative Analysis, Self Evaluation (Individuals), Student Evaluation, Psychology

Video Analysis of Mother-Child Interactions: Does the Role of Experience Affect the Accuracy and Reliability of Clinical Observations?

Peer reviewed

Direct link

Choo, Dawn; Dettman, Shani J. – Deafness & Education International, 2016

During the pre- and post-implant habilitation process, mothers of children using cochlear implants may be coached by clinicians to use appropriate communicative strategies during play according to the family's choice of communication approach. The present study compared observations made by experienced and inexperienced individuals in the analysis…

Descriptors: Parent Child Relationship, Mothers, Video Technology, Observation

Item Response Theory for Peer Assessment

Peer reviewed

Direct link

Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016

As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…

Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation

A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

Peer reviewed

Direct link

Attali, Yigal – Language Testing, 2016

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…

Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

Research Synthesis Methods	2
Advances in Physiology…	1
Assessment for Effective…	1
Bill & Melinda Gates…	1
Cambridge Assessment	1
Deafness & Education…	1
Education and Treatment of…	1
Educational Process:…	1
IEEE Transactions on Learning…	1
International Journal of…	1
Iranian Journal of Language…	1
Journal of Baltic Science…	1
Journal of Curriculum and…	1
Journal of Educational…	1
Journal of Learning Analytics	1
Journal of Special Education…	1
Language Assessment Quarterly	1
Language Testing	1
Learning Disabilities…	1
Reading & Writing Quarterly	1
Reading Psychology	1
TESL-EJ	1
More ▼

Ahmadi, Alireza	1
Allen, Abigail	1
Amanda Huee-Ping Wong	1
Armijo-Olivo, Susan	1
Attali, Yigal	1
Behrmann, Michael M.	1
Benton, Tom	1
Bolaños, Daniel	1
Bosch, Nigel	1
Campbell, Sandy	1
Chan, Kelvin K. W.	1
Cheng, Sierra	1
Choo, Dawn	1
Cole, Ron A.	1
Craig, Rodger	1
Demirkol, Murat	1
Derby, K. Mark	1
Dettman, Shani J.	1
Everson, Mary	1
Evmenova, Anna S.	1
Graff, Heidi J.	1
Guangtian Zhu	1
Hasbrouck, Jan	1
Ho, Andrew D.	1
Hughes, Sarah	1
More ▼