Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Constable, Elizabeth; Andrich, David – 1984
In circumstances where judges are required to make ratings of performance, it is usually required to have two or more raters who are trained to agree on independent ratings of the same performance. It is suggested that such a requirement may produce the paradox of attenuation associated with item analysis, in which too high a correlation between…
Descriptors: Elementary Secondary Education, Evaluation Methods, Interrater Reliability, Interviews
Cason, Gerald J.; And Others – 1985
To minimize the effects of systematic differences in raters' standards of clinical competence, a handicapping system was applied to the ratings made by fourteen preceptors of 128 junior year medical students in a 6-week psychiatry clerkship at the University of Arkansas for Medical Sciences. The handicap of a preceptor was the difference between…
Descriptors: Achievement Tests, Behavior Rating Scales, Clinical Experience, Grading
McDaniel, Barbara A. – 1985
A study was conducted to determine whether evaluators of large scale essay tests respond the same way toward essays written by English as a second language (ESL) and non-ESL students. The data examined came from the English Placement Test (EPT) administered in the province of British Columbia, Canada, in March 1979. The test was used to identify…
Descriptors: Chinese, Comparative Analysis, English (Second Language), Higher Education
Peer reviewedMagnan, Sally Sieloff – Canadian Modern Language Review, 1987
Differences in procedures used by academic institutions and government agencies in administering the American Council on the Teaching of Foreign Languages' Oral Proficiency Interview test are examined, and results and implications of two studies of interrater reliability are discussed. (MSE)
Descriptors: Comparative Analysis, Correlation, Evaluation Methods, Evaluators
Peer reviewedHarrison, Patti L. – Journal of Special Education, 1987
Part of a special issue on adaptive behavior, the article reviews adaptive behavior research in areas which include the relationship between adaptive behavior and intelligence and school achievement, relationship between different measures of adaptive behavior, predictive aspects, declassification, group differences in adaptive behavior,…
Descriptors: Academic Achievement, Adaptive Behavior (of Disabled), Behavior Rating Scales, Comparative Analysis
Peer reviewedNorcini, John J.; And Others – Journal of Educational Measurement, 1987
This study examined whether two variations on the typical Angoff group standard-setting process would produce sufficiently consistent results to recommend their use. The results imply that judgments gathered after an initial traditional group-process session can provide an efficient alternative mechanism for setting cutting scores using the Angoff…
Descriptors: Cutting Scores, Generalizability Theory, Graduate Medical Education, Group Dynamics
Peer reviewedRidley, Charles R. – Journal of Cross-Cultural Psychology, 1986
This study investigated the effects of therapists' observer-client race pairing and client self-disclosure on observers' descriptive and attitudinal ratings of clients. A major implication is that observer race, client race, and client self-disclosure influence clinical decision-making. (Author/LHW)
Descriptors: Clinical Diagnosis, Counselor Client Relationship, Cross Cultural Studies, Ethnic Groups
Peer reviewedBerk, Ronald A. – Review of Educational Research, 1986
Thirty-eight methods are presented for either setting standards or adjusting them based on an analysis of classification error rates. A trilevel classification scheme is used to categorize the methods, and 10 criteria of technical adequacy and practicability are proposed to evaluate them. (Author/LMO)
Descriptors: Criterion Referenced Tests, Cutting Scores, Elementary Secondary Education, Error of Measurement
Peer reviewedPayne, Beverly Dean – Educational and Psychological Measurement, 1984
The validity of elementary school pupil ratings of the teaching performance of 33 student teachers was examined using the nine competencies of the Teacher Performance Assessment Instruments. Ratings of college supervisors and supervising teachers were criteria for contrast of validity coefficients of student ratings. (Author/BS)
Descriptors: Elementary Education, Elementary School Students, Interrater Reliability, Rating Scales
Campbell, Stephen R. – Online Submission, 2004
This paper charts a cognitive history of the concepts of quantity and quality from three inter-related and inter-dependent perspectives of mathematics, logic, and physics. In so doing, other notions associated with the evolution of these concepts are identified and explicated. It is argued that the concepts of quantity and quality, considered in…
Descriptors: Educational Research, Qualitative Research, Research Methodology, Statistical Analysis
Herman, Joan L.; Webb, Noreen M.; Zuniga, Stephen A. – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2005
This study examined the impact of rater agreement on decisions concerning the alignment between the Golden State Examination (GSE) in High School Mathematics and the University of California (UC) "Statement on Competencies in Mathematics." UC faculty and high school mathematics teachers (n = 20) rated the mathematics items of the GSE…
Descriptors: State Standards, Case Studies, College Faculty, Mathematics Teachers
Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg – Online Submission, 2005
The main purpose of this study was to illustrate a polytomous IRT-based linking procedure that adjusts for rater variations. Test scores from two administrations of a statewide reading assessment were used. An anchor set of Year 1 students' constructed responses were rescored by Year 2 raters. To adjust for year-to-year rater variation in IRT…
Descriptors: Test Items, Measures (Individuals), Grade 8, Item Response Theory
Importance Placed on Managerial Leadership Competencies across Countries: What Managers Need to Know
Kowske, Brenda J.; Anthony, Kshanika – Online Submission, 2005
This study examines the importance placed on managerial competencies across countries. A partial replication of work done 5 years ago, this research demonstrated that various countries' managers have changed the emphasis placed on some managerial competencies. Overall, results showed that many managerial competencies have similar amounts of…
Descriptors: Competence, Administrator Role, Leadership Qualities, Management Development
McGinty, Dixie; Neel, John H. – 1996
A new standard setting approach is introduced, called the cognitive components approach. Like the Angoff method, the cognitive components method generates minimum pass levels (MPLs) for each item. In both approaches, the item MPLs are summed for each judge, then averaged across judges to yield the standard. In the cognitive components approach,…
Descriptors: Cognitive Processes, Criterion Referenced Tests, Evaluation Methods, Grade 3
Takala, Sauli – 1998
This paper discusses recent developments in language testing. It begins with a review of the traditional criteria that are applied to all measurement and outlines recent emphases that derive from the expanding range of stakeholders. Drawing on Alderson's seminal work, criteria are presented for evaluating communicative language tests. Developments…
Descriptors: Alternative Assessment, Communicative Competence (Languages), Comparative Analysis, Evaluation Criteria


