Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedStagnitti, Karen; Unsworth, Carolyn; Rodger, Sylvia – Canadian Journal of Occupational Therapy, 2000
A study of 82 preschoolers determined that a new play assessment (Child-Initiated Pretend Play Assessment), which identifies cognitive play skills, possessed acceptable interrater reliability and could discriminate between the play of typically developing preschoolers and those with preacademic problems. (Contains 65 references.) (JOW)
Descriptors: Behavior Problems, Cognitive Measurement, Interrater Reliability, Measures (Individuals)
Gilbride, Dennis; Vandergoot, David; Golden, Kristie; Stensrud, Robert – Rehabilitation Counseling Bulletin, 2006
This study describes the four-phase process used in developing the "Employer Openness Survey" (EOS). The EOS is an 18-item instrument designed to measure the openness of employers to hiring, accommodating, and promoting workers with disabilities. During the first phase, the authors generated potential questions and pilot-tested them with…
Descriptors: Test Validity, Rehabilitation Counseling, Placement, Interrater Reliability
Johnston, Brenda – Studies in Higher Education, 2004
The issue of arriving at agreement over outcomes in summative assessment of portfolios has been a major concern, given the complexity of the assessment task, the educational and political context, and the widespread and growing use of portfolios in higher education. This article examines research findings in this area. The discussion takes place…
Descriptors: Portfolios (Background Materials), Portfolio Assessment, Student Evaluation, Higher Education
Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003
When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…
Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability
Elander, James – Psychology Teaching Review, 2002
This article describes the development of assessment criteria for specific aspects of examination answers and coursework essays in psychology. The criteria specified the standards expected for seven aspects of students' work: addressing the question, covering the area, understanding the material, evaluating the material, developing arguments,…
Descriptors: Foreign Countries, Test Construction, Criteria, Item Analysis
Johnson, Robert L.; Penny, James; Gordon, Belita; Shumate, Steven R.; Fisher, Steven P. – Language Assessment Quarterly, 2005
Many studies have indicated that at least 2 raters should score writing assessments to improve interrater reliability. However, even for assessments that characteristically demonstrate high levels of rater agreement, 2 raters of the same essay can occasionally report different, or discrepant, scores. If a single score, typically referred to as an…
Descriptors: Interrater Reliability, Scores, Evaluation, Reliability
Smith, Ashley J.; Bihm, Elson M.; Tavkar, Poonam; Sturmey, Peter – Research in Developmental Disabilities: A Multidisciplinary Journal, 2005
Two studies assessed the reliability and utility of the Stimulus Preference Coding System (SPCS) to measure approach, avoidance, and happy and unhappy behaviors in persons with developmental disorders. Study 1 took place in an institutional setting. The nine participants were all adults with mental retardation and multiple associated disabilities.…
Descriptors: Psychological Patterns, Developmental Disabilities, Mental Retardation, Multiple Disabilities
Kelly-Vance, Lisa; Ryalls, Brigette Oliver – School Psychology International, 2005
Play assessment is gaining attention as a measure of the developing skills of young children. The procedures and methods of coding child behaviours vary considerably across researchers and practitioners. Because of this, definitive statements about the use of play assessment cannot be made without further research. The present study is an attempt…
Descriptors: Preschool Children, Play, Evaluation Methods, Interrater Reliability
Mowbray, Carol T.; Holter, Mark C.; Stark, Lori; Pfeffer, Carla; Bybee, Deborah – Research on Social Work Practice, 2005
Objective: Given the present emphasis on accountability and maintaining quality, the objective of this study was to develop, apply, and assess the reliability of a fidelity rating instrument for consumer-operated services--a promising model, but one for which fidelity criteria are not yet established. Method: Based on observations, documents, and…
Descriptors: Interrater Reliability, Criteria, Rating Scales, Test Construction
Clark, Christopher M. – Teachers and Teaching: Theory and Practice, 2005
The piece that follows had its first hearing as part of a symposium at the 1990 annual meeting of the American Educational Research Association in Boston. Sigrun Gudmundsdottir and Grace Grant were at the podium, presenting their two narrative accounts of one schoolteacher's practice. The schoolteacher was present as well, through the medium of a…
Descriptors: Conferences (Gatherings), Qualitative Research, Educational Research, Educational Researchers
Simon, Patricia – Educational and Psychological Measurement, 2006
The application range of Cohen's Kappa is extended to the field of sequential observation data, where omission mistakes of an observer may often occur. It is shown how the omission mistakes can be incorporated into the calculation of the Kappa coefficient without violating the statistic it is based on. The enhanced coefficient is termed Kappa…
Descriptors: Computation, Statistical Bias, Statistical Analysis, Logical Thinking
Hogue, Aaron; Henderson, Craig E.; Dauber, Sarah; Barajas, Priscilla C.; Fried, Adam; Liddle, Howard A. – Journal of Consulting and Clinical Psychology, 2008
This study examined the impact of treatment adherence and therapist competence on treatment outcome in a controlled trial of individual cognitive-behavioral therapy (CBT) and multidimensional family therapy (MDFT) for adolescent substance use and related behavior problems. Participants included 136 adolescents (62 CBT, 74 MDFT) assessed at intake,…
Descriptors: Counseling Techniques, Behavior Problems, Behavior Disorders, Interrater Reliability
Knoch, Ute; Read, John; von Randow, Janet – Assessing Writing, 2007
The training of raters for writing assessment through web-based programmes is emerging as an attractive and flexible alternative to the conventional method of face-to-face training sessions. Although some online training programmes have been developed, there is little published research on them. The current study aims to compare the effectiveness…
Descriptors: Writing Evaluation, Writing Tests, Professional Training, Interrater Reliability
Hatcher, Tim; Colton, Sharon – Journal of European Industrial Training, 2007
Purpose: The purpose of this article is to highlight the results of the online Delphi research project; in particular the procedures used to establish an online and innovative process of content validation and obtaining "rich" and descriptive information using the internet and current e-learning technologies. The online Delphi was proven to be an…
Descriptors: Delphi Technique, Readability, Research Methodology, Content Validity
Kim, Do-Hong; Huynh, Huynh – Journal of Technology, Learning, and Assessment, 2007
This study examined comparability of student scores obtained from computerized and paper-and-pencil formats of the large-scale statewide end-of-course (EOC) examinations in the two subject areas of Algebra and Biology. Evidence in support of comparability of computerized and paper-based tests was sought by examining scale scores, item parameter…
Descriptors: Computer Assisted Testing, Measures (Individuals), Biology, Algebra

Direct link
