Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Abbott, Marilyn L. – Alberta Journal of Educational Research, 2006
The purpose of this article is to promote an increased awareness of the processes for setting cut-scores for complex performance assessments by (a) describing the Analytic Judgment Method (AJM) for setting cut-scores, and (b) critically evaluating the technical adequacy and practicability of the AJM by focusing on one investigation where the AJM…
Descriptors: Interrater Reliability, Cutting Scores, Performance Based Assessment, Standard Setting (Scoring)
Mraz, Maryann E. – Reading Research and Instruction, 2004
The purpose of this study was to examine the perspectives of key policy informants on the factors that they believed influence policy decisions in literacy education. Participants were selected because they had significantly influenced, or had attempted to influence, policy decisions in literacy at either the national or state level.…
Descriptors: Educational Policy, Literacy Education, Interviews, Federal Legislation
Lecavalier, Luc – Journal of Autism and Developmental Disorders, 2005
The Gilliam Autism Rating Scale was developed to identify individuals with autism in research and clinical settings. It has benefited from wide use and acceptance but has received little empirical attention. The purpose of this study was to evaluate the construct and diagnostic validity, interrater reliability, and effects of participant…
Descriptors: Behavior Rating Scales, Factor Analysis, Construct Validity, Pervasive Developmental Disorders
Bradley, Thomas P.; Allen, Jeff M.; Hamilton, Scott; Filgo, Scott K. – Performance Improvement Quarterly, 2006
Multirater feedback, often called 360-degree feedback, is a popular development and assessment tool, especially for organizational leaders. Raters from different organizational levels, including subordinates, boss, peers, and self, rate the leader's performance. However, there seldom is strong agreement across rater groups. This study used the…
Descriptors: Leadership Effectiveness, Peer Evaluation, Job Performance, Personnel Evaluation
Schwalbe, Craig S.; Fraser, Mark W.; Day, Steven H.; Arnold, Elizabeth Mayfield – Journal of Offender Rehabilitation, 2004
Actuarial risk assessment instruments are used increasingly in juvenile justice to classify youths according to their risk of recidivism. The purpose of this article is to describe the results of two studies of one instrument: the North Carolina Assessment of Risk (NCAR). In the first study, the inter-rater reliability of the risk assessment…
Descriptors: Recidivism, Predictive Validity, Interrater Reliability, Program Effectiveness
Rockwell, Pam; Dunham, Mardis – Art Therapy: Journal of the American Art Therapy Association, 2006
This study explored the use of the Formal Elements Art Therapy Scale (FEATS) with a population of persons with a DSM-IV diagnosis of Substance Use Disorder who were court ordered for treatment. Two groups of adults (N = 40) were closely matched on age, gender, race, socioeconomic status and education level, and were administered the Person Picking…
Descriptors: Measures (Individuals), Interrater Reliability, Group Membership, Art Therapy
Cutler, Lois J.; Kane, Rosalie A.; Degenholtz, Howard B.; Miller, Michael J.; Grant, Leslie – Gerontologist, 2006
Purpose: We developed and tested theoretically derived procedures to observe physical environments experienced by nursing home residents at three nested levels: their rooms, the nursing unit, and the overall facility. Illustrating with selected descriptive results, in this article we discuss the development of the approach. Design and Methods: On…
Descriptors: Physical Environment, Nursing Homes, Research Tools, Evaluation Methods
Collins, Kathleen M. T. – Evaluation and Research in Education, 2006
The purpose of this mixed-methods study was to document the prevalence of sampling designs utilised in mixed-methods research and to examine the interpretive consistency between interpretations made in mixed-methods studies and the sampling design used. Classification of studies was based on a two-dimensional mixed-methods sampling model. This…
Descriptors: Social Science Research, Incidence, Social Sciences, School Psychology
Bennett, Randy Elliot; Rock, Donald A. – 1993
Formulating-Hypotheses (F-H) items present a situation and ask the examinee to generate as many explanations for it as possible. This study examined the generalizability, validity, and examinee perceptions of a computer-delivered version of the task. Eight F-H questions were administered to 192 graduate students. Half of the items restricted…
Descriptors: Computer Assisted Testing, Difficulty Level, Generalizability Theory, Graduate Students
Bridgeman, Brent; And Others – 1996
The various methods for computing the reliability of scores on Advanced Placement (AP) examinations are summarized. For the free response portion of the examinations, raters can contribute to score unreliability through both systematic severity errors (in which some raters consistently rate more severely than other raters) and through…
Descriptors: Advanced Placement, College Entrance Examinations, Error of Measurement, High School Students
O'Neill, Thomas R.; Lunz, Mary E. – 1996
To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…
Descriptors: Ability, Benchmarking, Comparative Analysis, Difficulty Level
Giota, Joanna – 1995
This study examined the concept of quality in child day care and how this can be measured by the Early Childhood Environment Rating Scale (ECERS). Swedish day care centers in three communities were administered a version of the ECERS, which was translated from the original scale to accommodate conceptual differences between Sweden and the United…
Descriptors: Day Care, Day Care Centers, Foreign Countries, Interrater Reliability
Reckase, Mark D. – 1997
This paper argues that special procedures for constructing assessment tools containing performance assessment tasks are unnecessary and that current test methodology can easily be generalized to complex performance assessment tasks without destroying the desirable characteristics of those tasks. Reasonable statistical requirements for sound…
Descriptors: Educational Assessment, Generalizability Theory, High Stakes Tests, Interrater Reliability
PDF pending restorationJaeger, Richard M.; Usher, Claire H. – 1991
This paper reports on a study of the foundation and application of two procedures used to specify appropriate weights to be applied to components in determining the overall quality of a school. These procedures are multiattribute utility technology (MAUT) and policy capturing, and the paper presents the results of applying them, using key…
Descriptors: Achievement Tests, Comparative Analysis, Curriculum Evaluation, Educational Assessment
Hambleton, Ronald K.; Plake, Barbara S. – 1994
The number of performance-based assessments is increasing rapidly, but to date there is no established procedure for setting standards on these assessments. This paper describes several extensions to the Angoff procedure to accommodate the characteristics of a performance-based assessment and presents the results of research in applying this…
Descriptors: Educational Assessment, Evaluation Methods, Interrater Reliability, Performance Based Assessment

Peer reviewed
Direct link
