Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Linn, Robert L. – 1994
The New Standards Project conducted a pilot test of a series of performance-based assessment tasks in mathematics and English language arts at Grades 4 and 8 in the spring of 1993. This paper reports the results of a series of generalizability analyses conducted for a subset of the 1993 pilot study data in mathematics. Generalizability analyses…
Descriptors: Decision Making, Educational Assessment, Elementary Education, Elementary School Students
Nakamura, Yuji – Cross Currents, 1992
A survey of 32 Japanese and 44 native English-speaking teachers of English as a Second Language investigated how the two groups evaluate the English speech skills of Japanese students. A 59-item questionnaire was designed to elicit comparative information on definition of oral proficiency, criteria (including newer ones derived from instruction…
Descriptors: Comparative Analysis, English (Second Language), Foreign Countries, Interrater Reliability
Morgan, George A.; Bartholomew, Sheridan – 1998
This study examined the reliability and construct validity of two types of measures of mastery motivation for elementary school children: a new version of the Dimensions of Mastery Questionnaires (DMQ) and behavioral mastery tasks. Participating were 64 mostly middle class and Caucasian 7- and 10-year-olds living in a middle-sized western city.…
Descriptors: Childhood Attitudes, Construct Validity, Elementary Education, Elementary School Students
Bridgeman, Brent; Cooper, Peter – 1998
Essays for the Graduate Management Admissions Test must be written with a word processor (except in some foreign countries). The test sponsors, the Graduate Management Admissions Council, believed that this is fair because some word processing skill is a prerequisite for advanced management education. Furthermore, it might also be unfair to…
Descriptors: College Entrance Examinations, College Students, Comparative Analysis, Essay Tests
Porter, Don – 1991
A discussion of oral language testing looks at the role of student attitudes, student and interviewer gender, and interviewer social status in the reliability of student assessments. Three small-scale studies investigating these factors are described. The first two involved only Arab students. In the first, it was found that students (all male)…
Descriptors: Communicative Competence (Languages), Interpersonal Relationship, Interrater Reliability, Interviews
Moore, JoAnne E. – 1984
Detroit's Peer Teachers as Mirrors and Monitors Project is intended to validate cost effective methods for increasing Academic Learning Time (ALT) for students in grades one through four. A major problem in this research effort has been the design of valid and reliable measures of the components of ALT. One very important component of ALT is…
Descriptors: Achievement Gains, Classroom Observation Techniques, Data Collection, Interrater Reliability
Gampel, Ezra S. – 1990
The study sought to determine if there are differences between shifts of workers in Intermediate Care Facilities in their ratings of the daily living skills of mentally retarded residents, and whether these differences reflect actual differences in performance by the residents. Staff were interviewed concerning the level of prompt required to…
Descriptors: Adults, Behavior Patterns, Children, Cues
Stolworthy, Reed L. – 1990
This follow-up study of student teachers contains data relative to the undergraduates' self-evaluation in the teaching-learning experience, and the assessments administered through the application of the same rating scale by the respective cooperating teachers and university supervisors. Descriptive and inferential statistics were applied to the…
Descriptors: Comparative Analysis, Cooperating Teachers, Elementary Secondary Education, Followup Studies
Rothman, M. L.; And Others – 1982
A practical application of generalizability theory, demonstrating how the variance components contribute to understanding and interpreting the data collected to evaluate a program, is described. The evaluation concerned 120 learning modules developed for the Dental Auxiliary Education Project. The goals of the project were to design, implement,…
Descriptors: Correlation, Data Collection, Dental Schools, Educational Research
Woodward, Arthur – 1985
Reviews of 2,554 educational software programs by thirty reviewing sources were examined to compare the tendency of different review sources to give generally positive, negative, or neutral ratings. The reliability of language arts and reading software reviews was studied. Thirty-eight percent of the software programs were reviewed by two or more…
Descriptors: Comparative Analysis, Computer Assisted Instruction, Courseware, Elementary Secondary Education
Reed, Donald B.; And Others – 1988
An instrument was developed to assess principal leadership. Two studies were then conducted to assess the reliability, validity, and utility of the instrument. Leadership style is the relative intensity of the presence of four modes of authority (traditional, charismatic, legal, and expert authority) and four modes of power (moral, psychological,…
Descriptors: Administrator Evaluation, Administrators, Construct Validity, Educational Assessment
Engelhard, George, Jr.; And Others – 1989
Whether judges on bias review committees can identify test items that function differently for black and white examinees was studied. Judges (n=42) on three bias review committees were asked to examine a set of items and predict differential item functioning (DIF) without empirical data. Test items from teacher certification tests in the content…
Descriptors: Black Students, Evaluators, Interrater Reliability, Item Analysis
Cronin, Linda L.; Capie, William – 1985
The purpose of this study was to compare the scoring of Teacher Performance Assessment Instruments (TPAI) indicators using discrete descriptors when some are considered "essential" with the scoring of these same indicators, and when no descriptors are considered essential. The two questions addressed in this study were: (1) To what…
Descriptors: Analysis of Variance, Behavior Rating Scales, Classroom Observation Techniques, Data Collection
Fuchs, Douglas; And Others – 1985
The present investigation represents a systematic effort to determine whether handicapped children have been included in the development of test norms, items, and indices of reliability and validity. It analysed up-to-date user manuals and technical supplements of 27 well known and widely used aptitude and achievement tests. Study procedure…
Descriptors: Achievement Tests, Aptitude Tests, Disabilities, Elementary Secondary Education
Novak, Carl D. – 1985
The evaluation team of the Lincoln Public Schools (Nebraska) used the multi-attribution utility technology (MAUT) approach to prioritize potential evaluation projects. The priorities were used to allocate resources to the district's most important projects, and to eliminate or scale down less important projects. The problem was caused initially…
Descriptors: Elementary Secondary Education, Evaluation Criteria, Evaluation Methods, Evaluation Needs


