Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 0 |
| Since 2007 (last 20 years) | 3 |
Descriptor
| Interrater Reliability | 31 |
| Mathematical Models | 31 |
| Evaluation Methods | 9 |
| Evaluators | 9 |
| Equations (Mathematics) | 8 |
| Error of Measurement | 8 |
| Correlation | 7 |
| Estimation (Mathematics) | 7 |
| Rating Scales | 7 |
| Test Reliability | 7 |
| Comparative Analysis | 5 |
| More ▼ | |
Source
Author
| Cason, Carolyn L. | 2 |
| Cason, Gerald J. | 2 |
| Beasley, T. Mark | 1 |
| Chae, Sunhee | 1 |
| Chen, Hsueh-Chih | 1 |
| Chen, Po-Hsi | 1 |
| Deutsch, Stuart Jay | 1 |
| Eiting, Mindert H. | 1 |
| Goffin, Richard D. | 1 |
| Grove, Will | 1 |
| Houston, Walter M. | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 2 |
| Postsecondary Education | 2 |
| Elementary Secondary Education | 1 |
| Secondary Education | 1 |
Audience
| Researchers | 4 |
| Practitioners | 1 |
Location
| Singapore | 2 |
| South Korea | 2 |
| Asia | 1 |
| Australia | 1 |
| Brazil | 1 |
| Connecticut | 1 |
| Denmark | 1 |
| Egypt | 1 |
| Estonia | 1 |
| Florida | 1 |
| Germany | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| NEO Personality Inventory | 1 |
| National Assessment of… | 1 |
What Works Clearinghouse Rating
Simin, Cai; Lam, Toh Tin – Journal of Science and Mathematics Education in Southeast Asia, 2016
This paper presents the design and development of a rubric for assessing mathematical modelling for mathematical modelling tasks at the secondary level. A rubric was crafted based on the mathematical modelling competencies synthesised by the researchers identified from four sources. The rubric was fine-tuned following an interview with three…
Descriptors: Scoring Rubrics, Test Construction, Mathematical Models, Secondary School Mathematics
Hung, Su-Pin; Chen, Po-Hsi; Chen, Hsueh-Chih – Creativity Research Journal, 2012
Product assessment is widely applied in creative studies, typically as an important dependent measure. Within this context, this study had 2 purposes. First, the focus of this research was on methods for investigating possible rater effects, an issue that has not received a great deal of attention in past creativity studies. Second, the…
Descriptors: Item Response Theory, Creativity, Interrater Reliability, Undergraduate Students
Peer reviewedSerlin, Ronald C.; Marascuilo, Leonard A. – Journal of Educational Statistics, 1983
Two alternatives to the problems of conducting planned and post hoc comparisons in tests of concordance and discordance for G groups of judges are examined. The two models are illustrated using existing data. (Author/JKS)
Descriptors: Attitude Measures, Comparative Analysis, Interrater Reliability, Mathematical Models
Peer reviewedUmesh, U. N.; And Others – Educational and Psychological Measurement, 1989
An approach is provided for calculating maximum values of the Kappa statistic of J. Cohen (1960) as a function of observed agreement proportions between evaluators. Separate calculations are required for different matrix sizes and observed agreement levels. (SLD)
Descriptors: Equations (Mathematics), Evaluators, Heuristics, Interrater Reliability
Peer reviewedKvalseth, Tarald O. – Educational and Psychological Measurement, 1991
An asymmetric version of J. Cohen's kappa statistic is presented as an appropriate measure for the agreement between two observers classifying items into nominal categories, when one observer represents the "standard." A numerical example with three categories is provided. (SLD)
Descriptors: Classification, Equations (Mathematics), Interrater Reliability, Mathematical Models
Peer reviewedRoss, Donald C. – Educational and Psychological Measurement, 1992
Large sample chi-square tests of the significance of the difference between two correlated kappas, weighted or unweighted, are derived. Cases are presented with one judge in common between the two kappas and no judge in common. An illustrative calculation is included. (Author/SLD)
Descriptors: Chi Square, Correlation, Equations (Mathematics), Evaluators
Peer reviewedTowstopiat, Olga – Contemporary Educational Psychology, 1984
The present article reviews the procedures that have been developed for measuring the reliability of human observers' judgments when making direct observations of behavior. These include the percentage of agreement, Cohen's Kappa, phi, and univariate and multivariate agreement measures that are based on quasi-equiprobability and quasi-independence…
Descriptors: Interrater Reliability, Mathematical Models, Multivariate Analysis, Observation
Peer reviewedYeaton, William H.; Wortman, Paul M. – Evaluation Review, 1993
Current practices of reporting a single mean intercoder agreement in meta-analysis leads to systematic bias and overestimates reliability. An alternative is recommended in which average intercoder agreement statistics are calculated within clusters of coded variables. Two studies of intercoder agreement illustrate the model. (SLD)
Descriptors: Coding, Decision Making, Estimation (Mathematics), Interrater Reliability
Zwick, Rebecca – 1986
Most currently used measures of inter-rater agreement for the nominal case incorporate a correction for "chance agreement." The definition of chance agreement is not the same for all coefficients, however. Three chance-corrected coefficients are Cohen's Kappa; Scott's Pi; and the S index of Bennett, Goldstein, and Alpert, which has…
Descriptors: Error of Measurement, Interrater Reliability, Mathematical Models, Measurement Techniques
Wang, Wen-chung – 1997
Traditional approaches to the investigation of the objectivity of ratings for constructed-response items are based on classical test theory, which is item-dependent and sample-dependent. Item response theory overcomes this drawback by decomposing item difficulties into genuine difficulties and rater severity. In so doing, objectivity of ability…
Descriptors: College Entrance Examinations, Constructed Response, Foreign Countries, Interrater Reliability
Uebersax, John; Grove, Will – 1989
Methods of probability modeling to analyze rater agreement are described, emphasizing their basic similarities and viewing them as variants of a common methodology. Statistical techniques for analyzing agreement data are described to address questions such as how many opinions are required to make a medical diagnosis with necessary accuracy. Kappa…
Descriptors: Clinical Diagnosis, Correlation, Estimation (Mathematics), Evaluation Methods
Beasley, T. Mark; Leitner, Dennis W. – 1993
The L statistic of E. B. Page (1963) tests the agreement of a single group of judges with an a priori ordering of alternative treatments. This paper extends the two group test of D. W. Leitner and C. M. Dayton (1976), an extension of the L test, to analyze difference in consensus between two unequally sized groups of judges. Exact critical values…
Descriptors: Comparative Analysis, Equations (Mathematics), Estimation (Mathematics), Evaluators
Weare, Jane; And Others – 1987
This annotated bibliography was developed upon noting a deficiency of information in the literature regarding the training of raters for establishing agreement. The ERIC descriptor, "Interrater Reliability", was used to locate journal articles. Some of the 33 resulting articles focus on mathematical concepts and present formulas for computing…
Descriptors: Annotated Bibliographies, Cloze Procedure, Correlation, Essay Tests
Peer reviewedvan den Bergh, Huub; Eiting, Mindert H. – Journal of Educational Measurement, 1989
A method of assessing rater reliability via a design of overlapping rater teams is presented. Covariances or correlations of ratings can be analyzed with LISREL models. Models in which the rater reliabilities are congeneric, tau-equivalent, or parallel can be tested. Two examples based on essay ratings are presented. (TJH)
Descriptors: Analysis of Covariance, Computer Simulation, Correlation, Elementary Secondary Education
Peer reviewedZegers, Frits E. – Applied Psychological Measurement, 1991
The degree of agreement between two raters rating several objects for a single characteristic can be expressed through an association coefficient, such as the Pearson product-moment correlation. How to select an appropriate association coefficient, and the desirable properties and uses of a class of such coefficients--the Euclidean…
Descriptors: Classification, Correlation, Data Interpretation, Equations (Mathematics)

Direct link
