NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 451 to 465 of 503 results Save | Export
Peer reviewed Peer reviewed
Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991
Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)
Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques
Halpin, Glennelle; McLean, James E. – 1991
Although the standard-setting method of W. H. Angoff (1971) has broad-based support in the research literature, inconsistencies in the resulting standards do occur. Sources of these inconsistencies are examined in a study of judges, competencies (items), rounds (replications), and the interactions among them. A modified Angoff approach was used to…
Descriptors: Analysis of Variance, Error of Measurement, Evaluators, High Schools
Raymond, Mark R.; Houston, Walter M. – 1990
Performance rating systems frequently use multiple raters in order to improve the reliability of ratings. However, unless all candidates are rated by the same raters, some candidates will be at an unfair advantage or disadvantage solely because they were rated by more stringent or lenient raters. To obtain fair and accurate evaluations of…
Descriptors: Algorithms, Computer Simulation, Educational Assessment, Evaluation Methods
Cantor, Nancy K.; Hoover, H. D. – 1986
This paper isolates and examines separately three distinct sources of error in essay scores: lack of agreement between raters; inconsistencies in performance within mode of discourse, and inconsistencies in performance between modes of discourse. Essay prompts in the Iowa Tests of Basic Skills (ITBS) Writing Supplement were designed to assess…
Descriptors: Academic Achievement, Cues, Elementary Secondary Education, Error of Measurement
Peer reviewed Peer reviewed
Levine, Harold G.; And Others – Evaluation and the Health Professions, 1986
A large pediatric residency program conducted an extensive analysis of the reliability and validity of the rating forms used to evaluate the pediatric residents enrolled in the program. Data indicate that although the reliability of individual ratings is very low, several factors achieved acceptable levels of reliability when aggregated.…
Descriptors: Analysis of Variance, Graduate Medical Education, Graduate Medical Students, Higher Education
Davis, Betsy; Caros, Jennifer; Grossen, Bonnie; Carnine, Douglas – 2002
The purpose of this study was to take an initial step toward developing sound and versatile predictive instruments (i.e., benchmark measures) in reading, writing, and math that can be used to assess high-school students' academic performance in a manner that will not only predict scores on high-stakes tests but will also be amenable to repeat…
Descriptors: Academic Ability, Accountability, Benchmarking, Disabilities
Peer reviewed Peer reviewed
Direct linkDirect link
Grietens, Hans; Geeraert, Liesl; Hellinckx, Walter – Child Abuse & Neglect: The International Journal, 2004
Objective: The aim was to construct and test the reliability (utility, internal consistency, interrater agreement) and the validity (internal validity, concurrent validity) of a scale for home visiting social nurses to identify risks of physical abuse and neglect in mothers with a newborn child. Method: A 71-item scale was constructed based on a…
Descriptors: Home Visits, Nurses, Child Abuse, Child Neglect
Salies, Tania Gastao – 1998
A discussion of the evaluation of writing, particularly in English as a Second Language, argues for a communicative approach reflecting the current approach to language teaching and learning. The movement toward more communication-oriented and more valid language testing is examined briefly, and direct assessment is chosen as the preferred format…
Descriptors: Communicative Competence (Languages), English (Second Language), Evaluation Criteria, Foreign Countries
Albanese, Mark A.; And Others – 1986
This study identifies distinguishing differences in lecture delivery styles of lecturers rated by students in a large multi-instructor course: the Introduction to Clinical Medicine Course (ICM). The 20 lowest- and highest-rated lecturers of the 1982 and 1983 ICM courses served as the target group. Non-student raters observing the 1984 lectures…
Descriptors: Analysis of Variance, Behavior Rating Scales, Higher Education, Interrater Reliability
Robertson, Gary; And Others – 1989
A study was conducted to refine the writing scale incorporated into the Peabody Individual Achievement Test-Revised. The test uses a single scale for judging writing samples from students in grades 2 through 12. It was questioned whether a single, relatively brief rating scale could have the sensitivity required to discriminate among the range of…
Descriptors: Elementary School Students, Elementary Secondary Education, Interrater Reliability, Latent Trait Theory
Micceri, Theodore – 1984
This paper investigates the reliability of the Florida Performance Measurement Systems' Summative Observation instrument. Developed for the Florida Beginning Teacher Evaluation Program, it provides behavioral ratings for teachers in a classroom setting. Data came from ratings of videotapes of nine teachers conducting actual lessons by nine teams…
Descriptors: Analysis of Variance, Classroom Observation Techniques, Elementary Secondary Education, Evaluation Methods
Bobek, Becky L.; Gore, Paul A. – American College Testing (ACT), Inc., 2004
This research report describes changes made to the Inventory of Work-Relevant Values when it was revised for online use as a part of the Internet version of DISCOVER. Users will see the following differences between the online and CD-ROM versions of the inventory: 22 items rather than 61, simplified presentation, and the contribution of all items…
Descriptors: Interrater Reliability, Field Tests, Internet, Test Construction
Wolfe, Edward W. – 1996
Although portfolio assessment is becoming increasingly popular, it may not survive unless portfolio scoring can meet the demands of large-scale assessment standards. The results of studies of interrater reliability with large-scale portfolio assessments have been mixed. This paper reports the scoring results of a nationwide portfolio pilot in…
Descriptors: Decision Making, Generalizability Theory, Interrater Reliability, Language Arts
Tyson, LeaAnn; Silverman, Stephen – 1992
The purpose of this study was to investigate differences in Texas Teacher Appraisal System (TTAS) scores when considering the scores of the first four individual domains (Instructional Strategies, Management and Organization, Presentation of Subject Matter, and Learning Environment), the sum of the scores of Domains I through IV, and the overall…
Descriptors: Analysis of Variance, Career Ladders, Classroom Observation Techniques, Elementary School Teachers
Alvermann, Donna E.; And Others – 1984
To help teachers develop an awareness of how they structure a discussion, an instrument was constructed called the Assessment of Classroom Interaction Dynamics (ACID). Two expert judges and 26 trainees then participated in a study (1) to estimate interrater reliability between expert judges in the use of the ACID, (2) to assess the validity of the…
Descriptors: Classroom Communication, Classroom Observation Techniques, Classroom Research, Comparative Analysis
Pages: 1  |  ...  |  24  |  25  |  26  |  27  |  28  |  29  |  30  |  31  |  32  |  33  |  34