Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedMarsh, Herbert W.; Ball, Samuel – Journal of Experimental Education, 1989
Agreement between two independent reviews of each of 278 manuscripts was compared on an overall recommendation and on specific rating items. Agreement between reviewers on separate dimensions, the unweighted sum of the dimensions, and various weighted sums was no better than that for the overall recommendation itself. (SLD)
Descriptors: Evaluation Methods, Factor Analysis, Interrater Reliability, Manuscripts
Peer reviewedHennessey, Beth Ann; Amabile, Teresa M. – Journal of Creative Behavior, 1988
The subjective judgment of observers was used to assess verbal creativity. Students, aged 5-10, told a story to accompany a picture series. Teachers rated the stories relative to one another. Interjudge reliability of the creativity measure was highly satisfactory. Two subsequent studies affirmed the results, with slightly lower interjudge…
Descriptors: Creativity, Creativity Tests, Elementary Education, Evaluation Methods
Peer reviewedShaw, Brian F.; Dobson, Keith S. – Journal of Consulting and Clinical Psychology, 1988
Reviews several scales used to evaluate competency of psychotherapists. Discusses concerns about interrater reliability and predictive validity of scales. Considers competency a state-like variable, with therapists demonstrating higher competence when they skillfully treat patients across range of difficulty levels. Contends that development of…
Descriptors: Competence, Counselor Evaluation, Counselor Qualifications, Evaluation Criteria
Peer reviewedFeletti, Grahame; Ryan, Greg – Assessment & Evaluation in Higher Education, 1994
The Triple Jump, a procedure for assessing students' problem-based learning, is applied to assessment of inquiry-based learning in a graduate course. Results suggest the need for more research into interrater reliability and other characteristics of the exercise. Some simple strategies for making the instrument cost effective are offered. (MSE)
Descriptors: Evaluation Methods, Graduate Study, Higher Education, Independent Study
Peer reviewedJaeger, Richard M. – Applied Measurement in Education, 1995
A performance-standard setting procedure termed judgmental policy capturing (JPC) and its application are described. A study involving 12 panelists demonstrated the feasibility of the JPC method for setting performance standards for classroom teachers seeking certification from the National Board for Professional Teaching Standards. (SLD)
Descriptors: Decision Making, Educational Assessment, Evaluation Methods, Evaluators
Kendall-Tackett, Kathleen A. – Child Abuse and Neglect: The International Journal, 1992
Professionals (n=201) working with child sexual abuse victims rated the normalcy of various behaviors with anatomical dolls for children ages two to five. Respondents agreed that overtly sexual behaviors were abnormal for nonabused children, but ratings of ambiguous behaviors varied depending on respondent's profession, gender, and years of…
Descriptors: Behavior Patterns, Behavior Rating Scales, Behavior Standards, Child Abuse
Peer reviewedSloan, R. L.; And Others – International Journal of Rehabilitation Research, 1992
This study tested the interrater reliability of the Modified Ashworth Scale in measuring upper and lower limb spasticity in 34 hemiplegic adult patients examined by 2 physiotherapists and 2 doctors. Findings indicated satisfactory reliability for upper limb spasticity but less satisfactory results for lower limb spasticity. (DB)
Descriptors: Adults, Behavior Rating Scales, Evaluation Methods, Interrater Reliability
Peer reviewedSlogoff, Stephen; And Others – Academic Medicine, 1994
To investigate the validity of anesthesiologist certification, 146 anesthesiology program directors were asked whether they would permit each of their graduating residents to complete 3 increasingly complex anesthetic regimens to the directors themselves and rate residents on specific skills. Director responses generally correspond to…
Descriptors: Administrator Attitudes, Anesthesiology, Certification, Graduate Medical Education
Brayden, Robert M.; And Others – Child Abuse and Neglect: The International Journal, 1991
Seventy physicians and two nurse practitioners rated colposcopic photographs. Results showed that leaders in the field of child sexual abuse assessment made significantly more accurate assessments than pediatricians, pediatric and family practice residents, and intern physicians. Predictors of agreement with standard assessments, although weak,…
Descriptors: Child Abuse, Competence, Evaluation Methods, Evaluators
Peer reviewedAlsawalmeh, Yousef M.; Feldt, Leonard S. – Psychometrika, 1994
A modification of a test of the equality of nonindependent alpha reliability coefficients is proposed. It avoids the limitation that the product of the number of test parts times the number of subjects be quite large. Monte Carlo studies indicate that this test can be used in comparing interrater reliabilities. (SLD)
Descriptors: Comparative Analysis, Computer Simulation, Equations (Mathematics), Interrater Reliability
Peer reviewedBond, Malcolm J.; Tustin, R. Don – Journal of Intellectual and Developmental Disability, 1999
This study assessed the psychometric properties of two subscales of the Adelaide Behaviour Disorder Scale that have been hypothesized to describe conduct problems and emotional problems of adults with intellectual disability. Criterion scores for identifying individuals needing clinical intervention were established and validated against…
Descriptors: Adults, Behavior Problems, Disability Identification, Eligibility
Peer reviewedMcEnery, Jean M.; Blanchard, P. Nick – Human Resource Development Quarterly, 1999
Business undergraduates (n=261) participating in an assessment center simulation were evaluated by graduate students and faculty. Assessor-peer and assessor-self ratings lacked convergent and divergent validity, but self-peer ratings had both. (SK)
Descriptors: Assessment Centers (Personnel), Business Administration Education, Higher Education, Interrater Reliability
Peer reviewedChae, Sunhee – Journal of Outcome Measurement, 1998
Using a recruitment test for Korean teachers, the use of the Rasch measurement model to control the effects of judge variable on the grading of essay-type items is examined. Ways of minimizing the variation of grading due to judge severity and reducing the number of judges without threatening objectivity of ability measurements are presented.…
Descriptors: Ability Identification, Achievement Tests, Essay Tests, Foreign Countries
Peer reviewedVan Bourgondien, Mary E.; Reichle, Nancy C.; Campbell, Duncan G.; Mesibov, Gary B. – Research in Developmental Disabilities, 1998
This study assessed the psychometric properties of the Environmental Rating Scale, a measure specifically designed to assess residential treatment programs for individuals with autism. The measure's reliability was demonstrated by assessments of the internal consistency, stability, and interrater reliability. Preliminary analysis of validity…
Descriptors: Adults, Autism, Evaluation Methods, Interrater Reliability
Peer reviewedLinn, Robert L.; Burton, Elizabeth – Educational Measurement: Issues and Practice, 1994
Generalizability of performance-based assessment scores across raters and tasks is examined, focusing on implications of generalizability analyses for specific uses and interpretations of assessment results. Although it seems probable that assessment conditions, task characteristics, and interactions with instructional experiences affect the…
Descriptors: Educational Assessment, Educational Experience, Generalizability Theory, Interaction


