Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 0 |
| Since 2007 (last 20 years) | 3 |
Descriptor
| Interrater Reliability | 31 |
| Mathematical Models | 31 |
| Evaluation Methods | 9 |
| Evaluators | 9 |
| Equations (Mathematics) | 8 |
| Error of Measurement | 8 |
| Correlation | 7 |
| Estimation (Mathematics) | 7 |
| Rating Scales | 7 |
| Test Reliability | 7 |
| Comparative Analysis | 5 |
| More ▼ | |
Source
Author
| Cason, Carolyn L. | 2 |
| Cason, Gerald J. | 2 |
| Beasley, T. Mark | 1 |
| Chae, Sunhee | 1 |
| Chen, Hsueh-Chih | 1 |
| Chen, Po-Hsi | 1 |
| Deutsch, Stuart Jay | 1 |
| Eiting, Mindert H. | 1 |
| Goffin, Richard D. | 1 |
| Grove, Will | 1 |
| Houston, Walter M. | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 2 |
| Postsecondary Education | 2 |
| Elementary Secondary Education | 1 |
| Secondary Education | 1 |
Audience
| Researchers | 4 |
| Practitioners | 1 |
Location
| Singapore | 2 |
| South Korea | 2 |
| Asia | 1 |
| Australia | 1 |
| Brazil | 1 |
| Connecticut | 1 |
| Denmark | 1 |
| Egypt | 1 |
| Estonia | 1 |
| Florida | 1 |
| Germany | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| NEO Personality Inventory | 1 |
| National Assessment of… | 1 |
What Works Clearinghouse Rating
Simin, Cai; Lam, Toh Tin – Journal of Science and Mathematics Education in Southeast Asia, 2016
This paper presents the design and development of a rubric for assessing mathematical modelling for mathematical modelling tasks at the secondary level. A rubric was crafted based on the mathematical modelling competencies synthesised by the researchers identified from four sources. The rubric was fine-tuned following an interview with three…
Descriptors: Scoring Rubrics, Test Construction, Mathematical Models, Secondary School Mathematics
Hung, Su-Pin; Chen, Po-Hsi; Chen, Hsueh-Chih – Creativity Research Journal, 2012
Product assessment is widely applied in creative studies, typically as an important dependent measure. Within this context, this study had 2 purposes. First, the focus of this research was on methods for investigating possible rater effects, an issue that has not received a great deal of attention in past creativity studies. Second, the…
Descriptors: Item Response Theory, Creativity, Interrater Reliability, Undergraduate Students
Peer reviewedTowstopiat, Olga – Contemporary Educational Psychology, 1984
The present article reviews the procedures that have been developed for measuring the reliability of human observers' judgments when making direct observations of behavior. These include the percentage of agreement, Cohen's Kappa, phi, and univariate and multivariate agreement measures that are based on quasi-equiprobability and quasi-independence…
Descriptors: Interrater Reliability, Mathematical Models, Multivariate Analysis, Observation
Weare, Jane; And Others – 1987
This annotated bibliography was developed upon noting a deficiency of information in the literature regarding the training of raters for establishing agreement. The ERIC descriptor, "Interrater Reliability", was used to locate journal articles. Some of the 33 resulting articles focus on mathematical concepts and present formulas for computing…
Descriptors: Annotated Bibliographies, Cloze Procedure, Correlation, Essay Tests
Paden, Patricia A. – 1986
Two factors which may affect the ratings assigned to an essay test are investigated: (1) context effects; and (2) score level effects. Context effects exist in essay scoring if an essay is rated higher when preceded by poor quality essays than when preceded by high quality essays. A score level effect is defined as a change in the score (value)…
Descriptors: Context Effect, Essay Tests, Holistic Evaluation, Interrater Reliability
Peer reviewedMcCrae, Robert R. – Multivariate Behavioral Research, 1993
To assess cross-observer agreement on personality profiles, an Index of Profile Agreement and an associated coefficient are proposed that take into account both the difference between the ratings and the extremes of their mean. Data from the Revised NEO Personality Inventory for 250 peer ratings/self-reports and 68 spouse ratings/self-reports…
Descriptors: Adults, Comparative Analysis, Equations (Mathematics), Evaluation Methods
Webber, Larry; And Others – 1986
Generalizability theory, which subsumes classical measurement theory as a special case, provides a general model for estimating the reliability of observational rating data by estimating the variance components of the measurement design. Research data from the "Heart Smart" health intervention program were analyzed as a heuristic tool.…
Descriptors: Behavior Rating Scales, Cardiovascular System, Error of Measurement, Generalizability Theory
van der Linden, Wim J. – 1982
A latent trait method is presented to investigate the possibility that Angoff or Nedelsky judges specify inconsistent probabilities in standard setting techniques for objectives-based instructional programs. It is suggested that judges frequently specify a low probability of success for an easy item but a large probability for a hard item. The…
Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Interrater Reliability
Peer reviewedSerlin, Ronald C.; Marascuilo, Leonard A. – Journal of Educational Statistics, 1983
Two alternatives to the problems of conducting planned and post hoc comparisons in tests of concordance and discordance for G groups of judges are examined. The two models are illustrated using existing data. (Author/JKS)
Descriptors: Attitude Measures, Comparative Analysis, Interrater Reliability, Mathematical Models
Peer reviewedUmesh, U. N.; And Others – Educational and Psychological Measurement, 1989
An approach is provided for calculating maximum values of the Kappa statistic of J. Cohen (1960) as a function of observed agreement proportions between evaluators. Separate calculations are required for different matrix sizes and observed agreement levels. (SLD)
Descriptors: Equations (Mathematics), Evaluators, Heuristics, Interrater Reliability
Peer reviewedKvalseth, Tarald O. – Educational and Psychological Measurement, 1991
An asymmetric version of J. Cohen's kappa statistic is presented as an appropriate measure for the agreement between two observers classifying items into nominal categories, when one observer represents the "standard." A numerical example with three categories is provided. (SLD)
Descriptors: Classification, Equations (Mathematics), Interrater Reliability, Mathematical Models
Kaplan, Bruce A.; Johnson, Eugene G. – 1992
Across the field of educational assessment the case has been made for alternatives to the multiple-choice item type. Most of the alternative types of items require a subjective evaluation by a rater. The reliability of this subjective rating is a key component of these types of alternative items. In this paper, measures of reliability are…
Descriptors: Educational Assessment, Elementary Secondary Education, Estimation (Mathematics), Evaluators
Peer reviewedRoss, Donald C. – Educational and Psychological Measurement, 1992
Large sample chi-square tests of the significance of the difference between two correlated kappas, weighted or unweighted, are derived. Cases are presented with one judge in common between the two kappas and no judge in common. An illustrative calculation is included. (Author/SLD)
Descriptors: Chi Square, Correlation, Equations (Mathematics), Evaluators
Peer reviewedYeaton, William H.; Wortman, Paul M. – Evaluation Review, 1993
Current practices of reporting a single mean intercoder agreement in meta-analysis leads to systematic bias and overestimates reliability. An alternative is recommended in which average intercoder agreement statistics are calculated within clusters of coded variables. Two studies of intercoder agreement illustrate the model. (SLD)
Descriptors: Coding, Decision Making, Estimation (Mathematics), Interrater Reliability
Zwick, Rebecca – 1986
Most currently used measures of inter-rater agreement for the nominal case incorporate a correction for "chance agreement." The definition of chance agreement is not the same for all coefficients, however. Three chance-corrected coefficients are Cohen's Kappa; Scott's Pi; and the S index of Bennett, Goldstein, and Alpert, which has…
Descriptors: Error of Measurement, Interrater Reliability, Mathematical Models, Measurement Techniques

Direct link
