NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20260
Since 20250
Since 2022 (last 5 years)0
Since 2017 (last 10 years)12
Since 2007 (last 20 years)19
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 19 results Save | Export
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Conger, Anthony J. – Educational and Psychological Measurement, 2017
Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2017
Assessing global interrater agreement is difficult as most published indices are affected by the presence of mixtures of agreements and disagreements. A previously proposed method was shown to be specifically sensitive to global agreement, excluding mixtures, but also negatively biased. Here, we propose two alternatives in an attempt to find what…
Descriptors: Interrater Reliability, Evaluation Methods, Statistical Bias, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019
Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…
Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials
Peer reviewed Peer reviewed
Direct linkDirect link
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Wilks, Scott E.; Geiger, Jennifer R.; Bates, Samantha M.; Wright, Amy L. – Research on Social Work Practice, 2017
Objective: The objective was to examine reference errors in research articles published in Research on Social Work Practice. High rates of reference errors in other top social work journals have been noted in previous studies. Methods: Via a sampling frame of 22,177 total references among 464 research articles published in the previous decade, a…
Descriptors: Social Work, Social Services, Accuracy, Educational Research
Peer reviewed Peer reviewed
Direct linkDirect link
Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018
This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…
Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students
Peer reviewed Peer reviewed
Direct linkDirect link
Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2015
Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not random, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of…
Descriptors: Interrater Reliability, Monte Carlo Methods, Measurement Techniques, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018
In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…
Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Welch, Adam C.; Karpen, Samuel C.; Cross, L. Brian; LeBlanc, Brandie N. – Research & Practice in Assessment, 2017
The aims of this study were to determine faculty's ability to accurately and reliably categorize exam questions using Bloom's Taxonomy, and if modified versions would improve the accuracy and reliability. Faculty experience and affiliation with a health sciences discipline were also considered. Faculty at one university were asked to categorize 30…
Descriptors: College Faculty, Medical School Faculty, Health Sciences, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
van Batenburg, Eline S. L.; Oostdam, Ron J.; van Gelderen, Amos J. S.; de Jong, Nivja H. – Language Testing, 2018
This article explores ways to assess interactional performance, and reports on the use of a test format that standardizes the interlocutor's linguistic and interactional contributions to the exchange. It describes the construction and administration of six scripted speech tasks (instruction, advice, and sales tasks) with pre-vocational learners (n…
Descriptors: Second Language Learning, Speech Tests, Interaction, Test Reliability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Thawabieh, Ahmad M. – Journal of Curriculum and Teaching, 2017
This study aimed to compare between the students' self-assessment and teachers' assessment. The study sample consisted of 71 students at Tafila Technical University studying Introduction to Psychology course. The researcher used 2 students' self-assessment tools and 2 tests. The results indicated that students can assess themselves accurately if…
Descriptors: Comparative Analysis, Self Evaluation (Individuals), Student Evaluation, Psychology
Peer reviewed Peer reviewed
Direct linkDirect link
Ahmadi, Alireza; Sadeghi, Elham – Language Assessment Quarterly, 2016
In the present study we investigated the effect of test format on oral performance in terms of test scores and discourse features (accuracy, fluency, and complexity). Moreover, we explored how the scores obtained on different test formats relate to such features. To this end, 23 Iranian EFL learners participated in three test formats of monologue,…
Descriptors: Oral Language, Comparative Analysis, Language Fluency, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Bahreini, Kiavash; Nadolski, Rob; Westera, Wim – Education and Information Technologies, 2016
This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…
Descriptors: Affective Behavior, Emotional Response, Electronic Learning, Recognition (Psychology)
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kayapinar, Ulas – Eurasian Journal of Educational Research, 2014
Problem Statement: There have been many attempts to research the effective assessment of writing ability, and many proposals for how this might be done. In this sense, rater reliability plays a crucial role for making vital decisions about testees in different turning points of both educational and professional life. Intra-rater and inter-rater…
Descriptors: Interrater Reliability, Essay Tests, Writing Tests, Grading
Previous Page | Next Page ยป
Pages: 1  |  2