NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 2,926 to 2,940 of 3,122 results Save | Export
Peer reviewed Peer reviewed
Howard, George S.; And Others – Journal of Educational Psychology, 1985
The accuracy of various evaluation methods for assessing teacher effectiveness was investigated. College instructors (n=43) were rated by students, colleagues, trained classroom raters, former students, and themselves. Results indicate these methods to be more valid than prior research would suggest. (BS)
Descriptors: College Faculty, Evaluation Methods, Higher Education, Interrater Reliability
Peer reviewed Peer reviewed
Kinicki, Angelo J.; And Others – Educational and Psychological Measurement, 1985
Using both the Behaviorally Anchored Rating Scales (BARS) and the Purdue University Scales, 727 undergraduates rated 32 instructors. The BARS had less halo effect, more leniency error, and lower interrater reliability. Both formats were valid. The two tests did not differ in rate discrimination or susceptibility to rating bias. (Author/GDC)
Descriptors: Behavior Rating Scales, College Faculty, Comparative Testing, Higher Education
Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg – Online Submission, 2005
Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology…
Descriptors: Test Items, Writing Tests, Reading Tests, Measures (Individuals)
Nakamura, Yuji – Journal of Communication Studies, 1996
To find ways to improve rater reliability of a tape-mediated speaking test for Japanese university students of English as a Second Language, two studies gathered information on: how raters actually made their choices on rating sheets of students' speaking ability; determined what criteria teachers think they use and actually use in rating…
Descriptors: English (Second Language), Evaluation Criteria, Foreign Countries, Interrater Reliability
Peer reviewed Peer reviewed
Irvine, Jacqueline Jordan – Journal of Research and Development in Education, 1983
The concurrence between preservice teachers' self-evaluations and the ratings of their supervisors was investigated, after both student teachers and supervisors completed training designed to facilitate self-assessment and collegiate relationships. Self-reports of the trained teachers were in moderate agreement with ratings of supervisors. (PP)
Descriptors: Competency Based Teacher Education, Evaluation Methods, Higher Education, Interrater Reliability
Peer reviewed Peer reviewed
Wigglesworth, Gillian – Language Testing, 1997
In this study, planning time was manipulated as a variable in a trial administration of a semi-direct oral interaction test. Discourse analytic techniques were used to determine the nature and/or significance of difference in the elicited discourse across two conditions in terms of complexity and accuracy. Findings suggest that planning time may…
Descriptors: Cognitive Development, Communicative Competence (Languages), Comparative Analysis, Discourse Analysis
Peer reviewed Peer reviewed
Colletta, Nancy Donahue; And Others – Early Child Development and Care, 1993
Discusses the development of the Indonesian Chart of Developmental Milestones, designed for use with existing nutrition and mother-child welfare programs to monitor children's development. A reliability and validity study using 108 Indonesian children from birth to 36 months of age established a tester-observer reliability of 0.97 and a…
Descriptors: Charts, Child Development, Child Health, Child Welfare
Peer reviewed Peer reviewed
Abedi, Jamal; Baker, Eva L. – Educational and Psychological Measurement, 1995
Results from a performance assessment in which 68 high school students wrote essays support the use of latent variable modeling for estimating reliability, concurrent validity, and generalizability of a scoring rubric. The latent variable modeling approach overcomes the limitations of certain conventional statistical techniques in handling…
Descriptors: Criteria, Essays, Estimation (Mathematics), Generalizability Theory
Peer reviewed Peer reviewed
Brennan, Robert L.; And Others – Educational and Psychological Measurement, 1995
Generalizability theory is used to examine the psychometric characteristics of the Listening and Writing Tests developed by American College Testing for its Work Keys program. Results with samples of 50 suggest the desirability of a minimum number of the tests' tape-recorded messages and the use of at least 2 raters. (SLD)
Descriptors: Audiotape Recordings, Error of Measurement, Generalizability Theory, Interaction
Peer reviewed Peer reviewed
Meisels, Samuel J.; And Others – Early Childhood Research Quarterly, 1995
Examined the reliability and validity of the Work Sampling System (WSS) for evaluating the schoolwork of 100 kindergarten children. Results indicated that the WSS checklist and summary report had very high internal and moderately high interrater reliability. The WSS accurately predicted the performance of the children on a norm-referenced…
Descriptors: Academic Achievement, Achievement Tests, Check Lists, Early Childhood Education
Peer reviewed Peer reviewed
Houston, Walter M.; And Others – Applied Psychological Measurement, 1991
The effectiveness of alternative procedures to correct for rater leniency/stringency effects was studied when true scores were known. Ordinary least squares, weighted least squares, and imputation of the missing data consistently outperformed averaging the observed ratings; and the imputation technique was superior to the least squares methods.…
Descriptors: Comparative Analysis, Computer Simulation, Educational Assessment, Equations (Mathematics)
Peer reviewed Peer reviewed
Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991
Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)
Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques
Peer reviewed Peer reviewed
Fischer, Jan Lockwood; Krause Eheart, Brenda – Early Childhood Research Quarterly, 1991
Providers' demographic characteristics, training, support networks, business practices, and stability of services were examined relative to their caregiving practices. Results from a schematic model approach suggest correlations between some of these factors and variances in ratings of caregiver practices. (LB)
Descriptors: Behavior Rating Scales, Child Caregivers, Comparative Analysis, Data Analysis
Peer reviewed Peer reviewed
Shresta, Tej B. – Foreign Language Annals, 1998
Describes how instruction and exposure contributed to the development of oral proficiency in English as a Second Language in mutually exclusive learning situations in Nepal. This study finds that both instruction and exposure contribute to second-language acquisition, the former promoting accuracy, the latter promoting fluency. (Author/VWL)
Descriptors: English (Second Language), Experiential Learning, Foreign Countries, Grammar
Peer reviewed Peer reviewed
Gamaroff, Raphael – System, 2000
To test how to achieve a reliable score on an essay test, based on judgments of specific criteria such as grammatical accuracy or topic relevance, a workshop was conducted on interrater reliability at a teacher educators conference in South Africa. Experienced English teacher educators assessed two essay protocols. Results showed substantial…
Descriptors: English (Second Language), Essays, Evaluation Criteria, Foreign Countries
Pages: 1  |  ...  |  192  |  193  |  194  |  195  |  196  |  197  |  198  |  199  |  200  |  ...  |  209