Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 4 |
| Since 2007 (last 20 years) | 17 |
Descriptor
| Evaluation Methods | 30 |
| Interrater Reliability | 30 |
| Models | 30 |
| Measurement Techniques | 7 |
| Comparative Analysis | 4 |
| Correlation | 4 |
| Evaluation Criteria | 4 |
| Evaluators | 4 |
| Measurement | 4 |
| Rating Scales | 4 |
| Standards | 4 |
| More ▼ | |
Source
Author
| Cason, Carolyn L. | 2 |
| Albert, Adelin | 1 |
| Andrich, David | 1 |
| Baird, Christopher | 1 |
| Bosch, Nigel | 1 |
| Bridgeman, Brent | 1 |
| Briggs-Gowan, Margaret | 1 |
| Brown, William L. | 1 |
| Carr, James E. | 1 |
| Cason, Gerald J. | 1 |
| Cella, David | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 18 |
| Journal Articles | 16 |
| Reports - Evaluative | 10 |
| Speeches/Meeting Papers | 8 |
| Reports - Descriptive | 2 |
| Tests/Questionnaires | 2 |
| Collected Works - Proceedings | 1 |
| Information Analyses | 1 |
| Opinion Papers | 1 |
Education Level
| Higher Education | 3 |
| Adult Education | 2 |
| Postsecondary Education | 2 |
| Secondary Education | 2 |
| Elementary Education | 1 |
| Elementary Secondary Education | 1 |
| Grade 7 | 1 |
Audience
| Researchers | 8 |
Location
| Florida | 2 |
| Netherlands | 2 |
| Arizona | 1 |
| Asia | 1 |
| Australia | 1 |
| Brazil | 1 |
| Connecticut | 1 |
| Denmark | 1 |
| Egypt | 1 |
| Estonia | 1 |
| Germany | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Advanced Placement… | 1 |
| Graduate Record Examinations | 1 |
What Works Clearinghouse Rating
Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…
Descriptors: Interrater Reliability, Models, Observation, Measurement
Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018
Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…
Descriptors: Models, Comparative Analysis, Prediction, Probability
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Raykov, Tenko; Dimitrov, Dimiter M.; von Eye, Alexander; Marcoulides, George A. – Educational and Psychological Measurement, 2013
A latent variable modeling method for evaluation of interrater agreement is outlined. The procedure is useful for point and interval estimation of the degree of agreement among a given set of judges evaluating a group of targets. In addition, the approach allows one to test for identity in underlying thresholds across raters as well as to identify…
Descriptors: Interrater Reliability, Models, Statistical Analysis, Computation
Swank, Jacqueline M.; Lambie, Glenn W.; Witta, E. Lea – Counselor Education and Supervision, 2012
The authors examined the psychometric properties of the Counseling Competencies Scale (CCS; University of Central Florida Counselor Education Faculty, 2009), an instrument designed to assess trainee competencies as measured in their counseling skills, dispositions, and behaviors. There was strong internal consistency for the 4-factor model for…
Descriptors: Test Validity, Interrater Reliability, Counselor Training, Measures (Individuals)
Karsten, Amanda M.; Carr, James E.; Lepper, Tracy L. – Behavior Modification, 2011
The rich technology of stimulus preference assessment (SPA) is a product of 40 years of experimental research. Basic principles of reinforcement and a modest empirical literature suggest that high-preference stimuli identified via SPA may enhance treatment efficacy and decrease problem behavior more effectively than less-preferred stimuli. SPAs…
Descriptors: Stimuli, Autism, Pervasive Developmental Disorders, Models
Fryer, Marilyn – Creativity Research Journal, 2012
This article explores a number of key issues with regard to the measurement of creativity in the course of conducting psychological research or when applying various evaluation measures. It is argued that, although creativity is a fuzzy concept, it is no more difficult to investigate than other fuzzy concepts people tend to take for granted. At…
Descriptors: Creativity, Educational Research, Psychological Studies, Evaluation Methods
Dirks, Melanie A.; De Los Reyes, Andres; Briggs-Gowan, Margaret; Cella, David; Wakschlag, Lauren S. – Journal of Child Psychology and Psychiatry, 2012
This paper examines the selection and use of multiple methods and informants for the assessment of disruptive behavior syndromes and attention deficit/hyperactivity disorder, providing a critical discussion of (a) the bidirectional linkages between theoretical models of childhood psychopathology and current assessment techniques; and (b) current…
Descriptors: Evidence, Attention Deficit Hyperactivity Disorder, Models, Psychopathology
Ruffini, Stephen J.; Makkonen, Reino; Tejwani, Jaclyn; Diaz, Marycruz – Regional Educational Laboratory West, 2014
This study describes how multiple-measure teacher evaluations were put into practice in a set of ten volunteering local education agencies (LEAs) in Arizona. After a key shift in state policy, five "pilot" LEAs implemented the new Arizona Department of Education teacher evaluation model in the 2012/13 school year, while five other…
Descriptors: Teacher Evaluation, Teacher Attitudes, Administrator Attitudes, Interrater Reliability
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012
Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…
Descriptors: Scoring, Test Scoring Machines, Automation, Models
Cullen, Anne; Coryn, Chris L. S. – Journal of MultiDisciplinary Evaluation, 2011
Background: Since the late 1970s participatory approaches have been widely promoted to evaluate international development programs. However, there is no universal agreement of what is meant by participatory evaluation. For some evaluators, participatory evaluations involve the extensive participation of all stakeholder groups (from donor to…
Descriptors: Economic Development, Global Approach, International Programs, Program Evaluation
Vanbelle, Sophie; Albert, Adelin – Psychometrika, 2009
We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen's kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the…
Descriptors: Interrater Reliability, Weighted Scores, Congruence (Psychology), Rating Scales
McKenzie, Robert G. – Learning Disability Quarterly, 2009
The assessment procedures within Response to Intervention (RTI) models have begun to supplant the use of traditional, discrepancy-based frameworks for identifying students with specific learning disabilities (SLD). Many RTI proponents applaud this shift because of perceived shortcomings in utilizing discrepancy as an indicator of SLD. However,…
Descriptors: Intervention, Learning Disabilities, Error of Measurement, Psychometrics
Nasstrom, Gunilla; Henriksson, Widar – Electronic Journal of Research in Educational Psychology, 2008
Introduction: In a standards-based school-system alignment of policy documents with standards and assessment is important. To be able to evaluate whether schools and students have reached the standards, the assessment should focus on the standards. Different models and methods can be used for measuring alignment, i.e. the correspondence between…
Descriptors: Curriculum Development, Interrater Reliability, Classification, Foreign Countries
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
