ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	17

Descriptor

Evaluation Methods	30
Interrater Reliability	30
Models	30
Measurement Techniques	7
Comparative Analysis	4
Correlation	4
Evaluation Criteria	4
Evaluators	4
Measurement	4
Rating Scales	4
Standards	4
Validity	4
Weighted Scores	4
Accuracy	3
Barriers	3
Computer Oriented Programs	3
Computer Simulation	3
Data Analysis	3
Data Processing	3
Elementary Secondary Education	3
Evaluation Problems	3
Foreign Countries	3
Intervention	3
Peer Evaluation	3
Performance Based Assessment	3
More ▼

Publication Type

Reports - Research	18
Journal Articles	16
Reports - Evaluative	10
Speeches/Meeting Papers	8
Reports - Descriptive	2
Tests/Questionnaires	2
Collected Works - Proceedings	1
Information Analyses	1
Opinion Papers	1

Education Level

Higher Education	3
Adult Education	2
Postsecondary Education	2
Secondary Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 7	1

Audience

Researchers

Location

Florida	2
Netherlands	2
Arizona	1
Asia	1
Australia	1
Brazil	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Germany	1
Greece	1
Hawaii	1
Ireland	1
Israel	1
Italy	1
Japan	1
Kazakhstan	1
Norway	1
Ohio	1
Pakistan	1
Pennsylvania	1
Philippines	1
Portugal	1
Singapore	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
Graduate Record Examinations	1

What Works Clearinghouse Rating

Showing 1 to 15 of 30 results Save | Export

Pedagogical Considerations for Examining Rater Variability in Rater-Mediated Assessments: A Three-Model Framework

Peer reviewed

Direct link

Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019

Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…

Descriptors: Interrater Reliability, Models, Observation, Measurement

Metrics for Discrete Student Models: Chance Levels, Comparisons, and Use Cases

Peer reviewed
PDF on ERIC

Download full text

Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018

Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…

Descriptors: Models, Comparative Analysis, Prediction, Probability

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Interrater Agreement Evaluation: A Latent Variable Modeling Approach

Peer reviewed

Direct link

Raykov, Tenko; Dimitrov, Dimiter M.; von Eye, Alexander; Marcoulides, George A. – Educational and Psychological Measurement, 2013

A latent variable modeling method for evaluation of interrater agreement is outlined. The procedure is useful for point and interval estimation of the degree of agreement among a given set of judges evaluating a group of targets. In addition, the approach allows one to test for identity in underlying thresholds across raters as well as to identify…

Descriptors: Interrater Reliability, Models, Statistical Analysis, Computation

An Exploratory Investigation of the Counseling Competencies Scale: A Measure of Counseling Skills, Dispositions, and Behaviors

Peer reviewed

Direct link

Swank, Jacqueline M.; Lambie, Glenn W.; Witta, E. Lea – Counselor Education and Supervision, 2012

The authors examined the psychometric properties of the Counseling Competencies Scale (CCS; University of Central Florida Counselor Education Faculty, 2009), an instrument designed to assess trainee competencies as measured in their counseling skills, dispositions, and behaviors. There was strong internal consistency for the 4-factor model for…

Descriptors: Test Validity, Interrater Reliability, Counselor Training, Measures (Individuals)

Description of a Practitioner Model for Identifying Preferred Stimuli with Individuals with Autism Spectrum Disorders

Peer reviewed

Direct link

Karsten, Amanda M.; Carr, James E.; Lepper, Tracy L. – Behavior Modification, 2011

The rich technology of stimulus preference assessment (SPA) is a product of 40 years of experimental research. Basic principles of reinforcement and a modest empirical literature suggest that high-preference stimuli identified via SPA may enhance treatment efficacy and decrease problem behavior more effectively than less-preferred stimuli. SPAs…

Descriptors: Stimuli, Autism, Pervasive Developmental Disorders, Models

Some Key Issues in Creativity Research and Evaluation as Seen from a Psychological Perspective

Peer reviewed

Direct link

Fryer, Marilyn – Creativity Research Journal, 2012

This article explores a number of key issues with regard to the measurement of creativity in the course of conducting psychological research or when applying various evaluation measures. It is argued that, although creativity is a fuzzy concept, it is no more difficult to investigate than other fuzzy concepts people tend to take for granted. At…

Descriptors: Creativity, Educational Research, Psychological Studies, Evaluation Methods

Annual Research Review: Embracing Not Erasing Contextual Variability in Children's Behavior--Theory and Utility in the Selection and Use of Methods and Informants in Developmental Psychopathology

Peer reviewed

Direct link

Dirks, Melanie A.; De Los Reyes, Andres; Briggs-Gowan, Margaret; Cella, David; Wakschlag, Lauren S. – Journal of Child Psychology and Psychiatry, 2012

This paper examines the selection and use of multiple methods and informants for the assessment of disruptive behavior syndromes and attention deficit/hyperactivity disorder, providing a critical discussion of (a) the bidirectional linkages between theoretical models of childhood psychopathology and current assessment techniques; and (b) current…

Descriptors: Evidence, Attention Deficit Hyperactivity Disorder, Models, Psychopathology

Principal and Teacher Perceptions of Implementation of Multiple-Measure Teacher Evaluation Systems in Arizona. REL 2015-062

Peer reviewed
PDF on ERIC

Download full text

Ruffini, Stephen J.; Makkonen, Reino; Tejwani, Jaclyn; Diaz, Marycruz – Regional Educational Laboratory West, 2014

This study describes how multiple-measure teacher evaluations were put into practice in a set of ten volunteering local education agencies (LEAs) in Arizona. After a key shift in state policy, five "pilot" LEAs implemented the new Arizona Department of Education teacher evaluation model in the 2012/13 school year, while five other…

Descriptors: Teacher Evaluation, Teacher Attitudes, Administrator Attitudes, Interrater Reliability

Evaluation of the "e-rater"® Scoring Engine for the "GRE"® Issue and Argument Prompts. Research Report. ETS RR-12-02

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012

Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…

Descriptors: Scoring, Test Scoring Machines, Automation, Models

Forms and Functions of Participatory Evaluation in International Development: A Review of the Empirical and Theoretical Literature

Peer reviewed

Direct link

Cullen, Anne; Coryn, Chris L. S. – Journal of MultiDisciplinary Evaluation, 2011

Background: Since the late 1970s participatory approaches have been widely promoted to evaluate international development programs. However, there is no universal agreement of what is meant by participatory evaluation. For some evaluators, participatory evaluations involve the extensive participation of all stakeholder groups (from donor to…

Descriptors: Economic Development, Global Approach, International Programs, Program Evaluation

Agreement between Two Independent Groups of Raters

Peer reviewed

Direct link

Vanbelle, Sophie; Albert, Adelin – Psychometrika, 2009

We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen's kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the…

Descriptors: Interrater Reliability, Weighted Scores, Congruence (Psychology), Rating Scales

Obscuring Vital Distinctions: The Oversimplification of Learning Disabilities within RTI

Peer reviewed

Direct link

McKenzie, Robert G. – Learning Disability Quarterly, 2009

The assessment procedures within Response to Intervention (RTI) models have begun to supplant the use of traditional, discrepancy-based frameworks for identifying students with specific learning disabilities (SLD). Many RTI proponents applaud this shift because of perceived shortcomings in utilizing discrepancy as an indicator of SLD. However,…

Descriptors: Intervention, Learning Disabilities, Error of Measurement, Psychometrics

Alignment of Standards and Assessment: A Theoretical and Empirical Study of Methods for Alignment

Peer reviewed

Direct link

Nasstrom, Gunilla; Henriksson, Widar – Electronic Journal of Research in Educational Psychology, 2008

Introduction: In a standards-based school-system alignment of policy documents with standards and assessment is important. To be able to evaluate whether schools and students have reached the standards, the assessment should focus on the standards. Different models and methods can be used for measuring alignment, i.e. the correspondence between…

Descriptors: Curriculum Development, Interrater Reliability, Classification, Foreign Countries

Previous Page | Next Page »

Pages: 1 | 2

Applied Measurement in…	1
Behavior Modification	1
Child Welfare	1
Counselor Education and…	1
Creativity Research Journal	1
ETS Research Report Series	1
Educational and Psychological…	1
Electronic Journal of…	1
International Association for…	1
International Educational…	1
Journal of Child Psychology…	1
Journal of Early Intervention	1
Journal of Educational…	1
Journal of Learning Analytics	1
Journal of MultiDisciplinary…	1
Learning Disability Quarterly	1
Psychology in the Schools	1
Psychometrika	1
Regional Educational…	1
More ▼

Cason, Carolyn L.	2
Albert, Adelin	1
Andrich, David	1
Baird, Christopher	1
Bosch, Nigel	1
Bridgeman, Brent	1
Briggs-Gowan, Margaret	1
Brown, William L.	1
Carr, James E.	1
Cason, Gerald J.	1
Cella, David	1
Cohen, Allan	1
Constable, Elizabeth	1
Coryn, Chris L. S.	1
Cullen, Anne	1
Davey, Tim	1
De Los Reyes, Andres	1
Diaz, Marycruz	1
Dimitrov, Dimiter M.	1
Dirks, Melanie A.	1
Ellett, Chad D.	1
Fryer, Marilyn	1
Healy, Theresa	1
Henriksson, Widar	1
More ▼