NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 421 to 435 of 503 results Save | Export
Livingston, Samuel A.; Sims-Gunzenhauser, Alice – 1995
A study was conducted to provide information for setting two separate standards, the accuracy score and the documentation score, for the Praxis III: Classroom Performance Assessment (Praxis III). Praxis III is intended for making instructional and licensing decisions about beginning teachers. This standard-setting study was a person-judgment…
Descriptors: Beginning Teachers, Classroom Observation Techniques, Documentation, Elementary Secondary Education
Angoff, William H. – 1989
This study was undertaken to test the hypothesis that items of the Test of English as a Foreign Language (TOEFL) containing reference to American people, places, customs, etc., tend to favor examinees who have spent some time living in the United States. Two samples of examinees were drawn from the March 1987 TOEFL administration, one tested in…
Descriptors: Context Effect, English (Second Language), Evaluators, Foreign Nationals
Joines, Richard C. – 1991
The development and validation of the General Management In-Basket (GMIB) is described. The GMIB is a theory-based generic in-basket simulation, designed to assess supervisory and management skills independent of any job classification. Three of the 15 in-basket items in the GMIB are critical and are scored on a 0-5 scale. The remaining 12 items…
Descriptors: Administrator Evaluation, Concurrent Validity, Factor Analysis, Interrater Reliability
Mitchell, Karen J.; Anderson, Judith A. – 1987
The Association of American Medical Colleges is conducting research to develop, implement, and evaluate a Medical College Admission Test (MCAT) essay testing program. Essay administration in the spring and fall of 1985 and 1986 suggested that additional research was needed on the development of topics which elicit similar skills and meet standard…
Descriptors: College Entrance Examinations, Essay Tests, Estimation (Mathematics), Generalizability Theory
Shale, Doug – 1986
This study is an attempt at a cohesive characterization of the concept of essay reliability. As such, it takes as a basic premise that previous and current practices in reporting reliability estimates for essay tests have certain shortcomings. The study provides an analysis of these shortcomings--partly to encourage a fuller understanding of the…
Descriptors: Analysis of Variance, Correlation, Error of Measurement, Essay Tests
Peterson, Gary W. – 1983
Even though several national testing firms have developed measures to evaluate the effectiveness of baccalaureate education, there continues to be a general reluctance on the part of faculty in colleges and universities to accept these measures as criteria on which to evaluate educational programs. Some of the resistance appears to lie in the lack…
Descriptors: Bachelors Degrees, Cognitive Processes, Difficulty Level, Essay Tests
Walker, Richard N. – 1989
In an assessment of the adequacy of the Gesell screening examination as a test instrument, a Gesell Screening Evaluation was given to 400 children semi-annually from their 4th to 6th year. The sample, which was stratified by parent occupation, included 40 girls and 40 boys at 5 age levels. The test battery corresponded with the Gesell Preschool…
Descriptors: Chronological Age, Early Childhood Education, Followup Studies, Interrater Reliability
Yap, Kueh Chin; Capie, William – 1985
The purpose of this study was to compare the relative magnitude of the variance components and generalizability coefficients derived from the Teacher Performance Assessment Instruments (TPAI) data using two different methods of data collection: (1) occasions when observers were in the classroom for simultaneous observation and (2) occasions when…
Descriptors: Analysis of Variance, Classroom Observation Techniques, Data Collection, Elementary Secondary Education
Breland, Hunter M.; And Others – 1987
Six university English departments collaborated in this examination of the differences between multiple-choice and essay tests in evaluating writing skills. The study also investigated ways the two tools can complement one another, ways to improve cost effectiveness of essay testing, and ways to integrate assessment and the educational process.…
Descriptors: Comparative Testing, Efficiency, Essay Tests, Higher Education
Dielman, T. E.; Horvatich, Paula K. – 1985
The purposes of this study were to establish the interrater reliability, dimensionality, and internal consistency of an instruction evaluation instrument used at The University of Michigan Medical School. Using the nine-item rating scale, 1,758 student ratings and 88 staff ratings were gathered on 61 faculty. Interrater agreement ranged from .28…
Descriptors: Evaluation Methods, Graduate Medical Education, Higher Education, Interrater Reliability
van der Linden, Wim J. – 1982
A latent trait method is presented to investigate the possibility that Angoff or Nedelsky judges specify inconsistent probabilities in standard setting techniques for objectives-based instructional programs. It is suggested that judges frequently specify a low probability of success for an easy item but a large probability for a hard item. The…
Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Interrater Reliability
Peer reviewed Peer reviewed
Mayfield, Kathy L.; And Others – Journal of School Psychology, 1984
Investigated interrater reliability of the AAMD Adaptive Behavior Scale-Public School Version in a sample of 31 educable mentally handicapped children who were rated by their parents, special education teacher, classroom teacher, and an independent observer. Results showed ratings of the special education teacher were generally lower. (JAC)
Descriptors: Adjustment (to Environment), Behavior Rating Scales, Children, Elementary Education
Peer reviewed Peer reviewed
Baer, John – Roeper Review, 1994
Two studies are reported that measure the long-term stability of performance assessments involving story-writing and poetry-writing (involving grade four and five students) and story-telling (involving grade two students). The long-term stability of these assessments compares favorably with stability figures for other creativity tests. (Author/JDD)
Descriptors: Creative Thinking, Creativity, Creativity Tests, Elementary Education
Peer reviewed Peer reviewed
Sevin, Jay A.; And Others – Journal of Autism and Developmental Disorders, 1991
This study, involving 24 children or adolescents with pervasive developmental disorders, assessed 3 autism scales: Autism Behavior Checklist, Real Life Rating Scale, and Childhood Autism Rating Scale. The study analyzed interrater reliability, correlations between pairs of the three scales, diagnostic classification cutoff scores, and…
Descriptors: Adaptive Behavior (of Disabled), Behavior Rating Scales, Check Lists, Educational Diagnosis
Peer reviewed Peer reviewed
Direct linkDirect link
Als, Heidelise; Butler, Samantha; Kosta, Sandra; McAnulty, Gloria – Mental Retardation and Developmental Disabilities Research Reviews, 2005
The Assessment of Preterm Infants' Behavior (APIB) is a newborn neurobehavioral assessment appropriate for preterm, at risk, and full-term newborns, from birth to 1 month after expected due date. The APIB is based in ethological--evolutionary thought and focuses on the assessment of mutually interacting behavioral subsystems in simultaneous…
Descriptors: Premature Infants, Neonates, Infant Behavior, Measurement Techniques
Pages: 1  |  ...  |  24  |  25  |  26  |  27  |  28  |  29  |  30  |  31  |  32  |  33  |  34