ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	12
Since 2007 (last 20 years)	19

Descriptor

Accuracy	19
Interrater Reliability	19
Statistical Analysis	19
Comparative Analysis	7
Correlation	7
Foreign Countries	7
Classification	5
English (Second Language)	4
Error of Measurement	4
Evaluators	4
Second Language Learning	4
Evaluation Methods	3
Interviews	3
Oral Language	3
Pretests Posttests	3
Scores	3
Second Language Instruction	3
Test Reliability	3
Undergraduate Students	3
Computer Software	2
Discriminant Analysis	2
Elementary School Students	2
Essay Tests	2
Introductory Courses	2
Language Tests	2
More ▼

Source

Educational and Psychological…	3
American Annals of the Deaf	1
Applied Measurement in…	1
CBE - Life Sciences Education	1
Cambridge Assessment	1
Education and Information…	1
Eurasian Journal of…	1
Iranian Journal of Language…	1
Journal of Curriculum and…	1
Language Assessment Quarterly	1
Language Testing	1
Measurement in Physical…	1
Reading Psychology	1
Research & Practice in…	1
Research Synthesis Methods	1
Research on Social Work…	1
TESL-EJ	1
More ▼

Publication Type

Journal Articles	18
Reports - Research	17
Tests/Questionnaires	3
Reports - Evaluative	2

Education Level

Higher Education	6
Postsecondary Education	4
Secondary Education	2
Elementary Education	1
Grade 1	1
Grade 10	1
Grade 7	1
High Schools	1

Audience

Location

Iran	3
Netherlands	2
Jordan	1
Netherlands (Amsterdam)	1
Tennessee	1

Laws, Policies, & Programs

Assessments and Surveys

Gray Oral Reading Test	1
Wide Range Achievement Test	1
Woodcock Reading Mastery Test	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Kappa and Rater Accuracy: Paradigms and Parameters

Peer reviewed

Direct link

Conger, Anthony J. – Educational and Psychological Measurement, 2017

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…

Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis

An Unbiased Estimate of Global Interrater Agreement

Peer reviewed

Direct link

Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2017

Assessing global interrater agreement is difficult as most published indices are affected by the presence of mixtures of agreements and disagreements. A previously proposed method was shown to be specifically sensitive to global agreement, excluding mixtures, but also negatively biased. Here, we propose two alternatives in an attempt to find what…

Descriptors: Interrater Reliability, Evaluation Methods, Statistical Bias, Accuracy

Estimating Hazard Ratios from Published Kaplan-Meier Survival Curves: A Methods Validation Study

Peer reviewed

Direct link

Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019

Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…

Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Reference Accuracy among Research Articles Published in "Research on Social Work Practice"

Peer reviewed

Direct link

Wilks, Scott E.; Geiger, Jennifer R.; Bates, Samantha M.; Wright, Amy L. – Research on Social Work Practice, 2017

Objective: The objective was to examine reference errors in research articles published in Research on Social Work Practice. High rates of reference errors in other top social work journals have been noted in previous studies. Methods: Via a sampling frame of 22,177 total references among 464 research articles published in the previous decade, a…

Descriptors: Social Work, Social Services, Accuracy, Educational Research

Using Subjective and Objective Measures to Predict Level of Reading Fluency at the End of First Grade

Peer reviewed

Direct link

Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018

This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…

Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students

A Ratio Test of Interrater Agreement with High Specificity

Peer reviewed

Direct link

Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2015

Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not random, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of…

Descriptors: Interrater Reliability, Monte Carlo Methods, Measurement Techniques, Accuracy

Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

Peer reviewed

Direct link

van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018

In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…

Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills

A Multidisciplinary Assessment of Faculty Accuracy and Reliability with Bloom's Taxonomy

Peer reviewed
PDF on ERIC

Download full text

Welch, Adam C.; Karpen, Samuel C.; Cross, L. Brian; LeBlanc, Brandie N. – Research & Practice in Assessment, 2017

The aims of this study were to determine faculty's ability to accurately and reliably categorize exam questions using Bloom's Taxonomy, and if modified versions would improve the accuracy and reliability. Faculty experience and affiliation with a health sciences discipline were also considered. Faculty at one university were asked to categorize 30…

Descriptors: College Faculty, Medical School Faculty, Health Sciences, Test Items

Measuring L2 Speakers' Interactional Ability Using Interactive Speech Tasks

Peer reviewed

Direct link

van Batenburg, Eline S. L.; Oostdam, Ron J.; van Gelderen, Amos J. S.; de Jong, Nivja H. – Language Testing, 2018

This article explores ways to assess interactional performance, and reports on the use of a test format that standardizes the interlocutor's linguistic and interactional contributions to the exchange. It describes the construction and administration of six scripted speech tasks (instruction, advice, and sales tasks) with pre-vocational learners (n…

Descriptors: Second Language Learning, Speech Tests, Interaction, Test Reliability

A Comparison between Students' Self-Assessment and Teachers' Assessment

Peer reviewed
PDF on ERIC

Download full text

Thawabieh, Ahmad M. – Journal of Curriculum and Teaching, 2017

This study aimed to compare between the students' self-assessment and teachers' assessment. The study sample consisted of 71 students at Tafila Technical University studying Introduction to Psychology course. The researcher used 2 students' self-assessment tools and 2 tests. The results indicated that students can assess themselves accurately if…

Descriptors: Comparative Analysis, Self Evaluation (Individuals), Student Evaluation, Psychology

Assessing English Language Learners' Oral Performance: A Comparison of Monologue, Interview, and Group Oral Test

Peer reviewed

Direct link

Ahmadi, Alireza; Sadeghi, Elham – Language Assessment Quarterly, 2016

In the present study we investigated the effect of test format on oral performance in terms of test scores and discourse features (accuracy, fluency, and complexity). Moreover, we explored how the scores obtained on different test formats relate to such features. To this end, 23 Iranian EFL learners participated in three test formats of monologue,…

Descriptors: Oral Language, Comparative Analysis, Language Fluency, Accuracy

Towards Real-Time Speech Emotion Recognition for Affective E-Learning

Peer reviewed

Direct link

Bahreini, Kiavash; Nadolski, Rob; Westera, Wim – Education and Information Technologies, 2016

This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…

Descriptors: Affective Behavior, Emotional Response, Electronic Learning, Recognition (Psychology)

Measuring Essay Assessment: Intra-Rater and Inter-Rater Reliability

Peer reviewed
PDF on ERIC

Download full text

Kayapinar, Ulas – Eurasian Journal of Educational Research, 2014

Problem Statement: There have been many attempts to research the effective assessment of writing ability, and many proposals for how this might be done. In this sense, rater reliability plays a crucial role for making vital decisions about testees in different turning points of both educational and professional life. Intra-rater and inter-rater…

Descriptors: Interrater Reliability, Essay Tests, Writing Tests, Grading

Previous Page | Next Page »

Pages: 1 | 2

Cousineau, Denis	2
Laurencelle, Louis	2
Ahmadi, Alireza	1
Bahreini, Kiavash	1
Bates, Samantha M.	1
Benton, Tom	1
Chan, Kelvin K. W.	1
Cheng, Sierra	1
Cohen, Allan	1
Conger, Anthony J.	1
Cross, L. Brian	1
Geiger, Jennifer R.	1
Haudek, Kevin C.	1
Hughes, Sarah	1
Karpen, Samuel C.	1
Kayapinar, Ulas	1
LeBlanc, Brandie N.	1
Leech, Tony	1
Lombardino, Linda J.	1
Merrill, John	1
Morris, Darrell	1
Moscarella, Rosa A.	1
Nadolski, Rob	1
Oostdam, Ron J.	1
Park, Jungjun	1
More ▼