Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedLichtenstein, Robert – Early Childhood Research Quarterly, 1990
Investigated the adequacy of the Gesell School Readiness Screening Test (GSRST) to gauge 46 kindergartners' readiness. Found agreement between GSRST and teacher assessments of student readiness. Test-retest and interrater reliability were below acceptable levels, and lower than figures yielded by a quantitative scoring method. Concluded that GSRST…
Descriptors: Examiners, Experimenter Characteristics, Grade 1, Intelligence Tests
Peer reviewedHunt, D. Daniel; And Others – Academic Medicine, 1991
A study compared student evaluations made by residency directors and deans in 2 medical schools, using 3 standard methods of ranking 20 students per school. Ordinal ranking showed substantial agreement for 15 of 16 residency directors. Two methods of clustering into fixed groups gave high agreement only for top students. (Author/MSE)
Descriptors: Administrator Attitudes, Comparative Analysis, Deans, Evaluation Methods
Peer reviewedShaw, Stephanie; Coggins, Truman E. – Journal of Speech and Hearing Research, 1991
This study, involving five experienced and trained speech language pathologists, categorized the elicited imitations of five profoundly and five severely prelingually hearing-impaired subjects using the Phonetic Level Evaluation. Failure to obtain acceptably high levels of reliability suggests that this measure may not yet be an accurate and…
Descriptors: Acoustic Phonetics, Articulation (Speech), Congenital Impairments, Deafness
Peer reviewedMills, Craig N.; And Others – Educational Measurement: Issues and Practice, 1991
An approach is presented to the definition of minimal competence for judges to use in standard setting. Panelists in standard setting must receive training to ensure that differences in rating result from differences in perceptions of item difficulty, not in differences of opinion about the definition of minimal competence. (SLD)
Descriptors: Cutting Scores, Decision Making, Definitions, Difficulty Level
Peer reviewedGagne, Francoys; And Others – Gifted Child Quarterly, 1993
Forty prototypical descriptions representing 4 aptitude domains and 4 talent fields were rated by 2,343 intermediate-level pupils and their teachers, and indices of interpeer agreement were computed. A majority of the prototypes maintained acceptable interpeer agreement levels. Interpeer agreement depended primarily on the specific aptitude or…
Descriptors: Ability Identification, Evaluation Methods, Gifted, Intermediate Grades
Peer reviewedGoffin, Richard D.; Jackson, Douglas N. – Multivariate Behavioral Research, 1992
The way in which trait and rater variance combine in multitrait-multirater (MTMR) performance appraisal data is explored. Implications of the confirmatory factor analytic model and the composite direct product (CDP) model for MTMR data are examined. Superior fit of the CDP model for four data sets is discussed. (SLD)
Descriptors: Equations (Mathematics), Evaluation Methods, Goodness of Fit, Interrater Reliability
Peer reviewedKeogh, Barbara K.; Bernheimer, Lucinda P. – Journal of Emotional and Behavioral Disorders, 1998
Concordance between mothers' and teachers' perceptions of behavior problems and competencies of 74 children (mean age 131.6 months) with nonspecific developmental delays was assessed with standardized and study-developed scales. Correlations between raters averaged .43. There were minimal differences in ratings of boys and girls, and few…
Descriptors: Behavior Problems, Behavior Rating Scales, Children, Cognitive Ability
Peer reviewedEvans, K. M.; Cotton, M. M.; Einfeld, S. L.; Florio, T. – Journal of Intellectual and Developmental Disability, 1999
Nurses applied standard behavioral criteria for major depression to evaluate 89 institutionalized adults with severe or profound intellectual disability. Results suggested that several additional behaviors listed on the Aberrant Behavior Checklist and the Developmental Behavior Checklist may be associated with this disorder in this population.…
Descriptors: Adults, Check Lists, Depression (Psychology), Disability Identification
Peer reviewedDuker, Pieter C. – Research in Developmental Disabilities, 1999
To assess the psychometric characteristics of the Verbal Behavior Assessment Scale, the 15-item questionnaire was administered to pairs of caregivers of 115 individuals with developmental disabilities. Exploratory factor analysis involving 11 more participants revealed evidence concerning the distinction of three different communicative functions…
Descriptors: Adults, Children, Communication Skills, Developmental Disabilities
Peer reviewedTalbott, Elizabeth; Lloyd, John Wills – Exceptionality, 1997
A study obtained the ratings of parents, teachers, and youth for 16 adolescent girls who had been treated in a psychiatric hospital and 16 controls, for total behavior problems and competence. Only parents identified their hospitalized daughters as severely clinically disturbed and in need of mental health services. (Author/CR)
Descriptors: Adolescents, Behavior Disorders, Clinical Diagnosis, Competence
Peer reviewedMatthews, Peter; Holmes, J. Roger; Vickers, Paul; Coporaal, Bep – Educational Research and Evaluation (An International Journal on Theory and Practice), 1998
The reliability and validity of judgments of teaching quality made by independent inspectors in primary and secondary school classrooms in England were studied. Results with 173 pairs of observations by 100 inspectors show that two trained inspectors, independently observing the same lesson, are likely to identify the same strengths and weaknesses…
Descriptors: Educational Quality, Elementary School Teachers, Elementary Secondary Education, Evaluation Methods
Peer reviewedSafyer, Andrew W.; Hauser, Stuart T. – Journal of Adolescent Research, 1994
An observational coding system that assesses emotional expression operationalized through voice cues and speech content was tested for interrater reliability and validity. Subjects were normal and psychiatrically hospitalized adolescents participating in a study of adolescent ego development and familial interactions. Findings demonstrated the…
Descriptors: Adolescents, Emotional Development, Emotional Experience, Family Relationship
Allison, Meredith; Brimacombe, C. A. Elizabeth; Hunter, Michael A.; Kadlec, Helena – Discourse Processes: A Multidisciplinary Journal, 2006
This study examined the relationship between witness age, narrative features in testimony, and the perceived credibility of witnesses. Ninety older and young adult witnesses to a staged theft were videotaped as they freely recalled crime events. Later, participant-jurors viewed the videos and assessed the witnesses' credibility. Operational…
Descriptors: Adults, Young Adults, Definitions, Interrater Reliability
Bulotsky-Shearer, Rebecca; Fantuzzo, John – Psychology in the Schools, 2004
A series of studies extended psychometric research on the Adjustment Scales for Preschool Intervention (ASPI). The ASPI is a multidimensional measure of preschool emotional and behavioral adjustment for use within formal early childhood educational programs. These studies used a multiple method, multisource approach to provide additional evidence…
Descriptors: Teaching Methods, Psychometrics, Intervention, Validity
Forster, Patricia A. – Research in Science Education, 2005
The issue of unfairness arises in high-stakes public examinations when students choose questions from alternatives that are offered and marks on the alternatives turn out to be discrepant. This paper addresses and defines unfairness and discrepancy in the context of alternative questions in Physics Tertiary Entrance Examinations (TEE) in Western…
Descriptors: Foreign Countries, Physics, Identification, High Stakes Tests

Direct link
