Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedChambers, Francine; Richards, Brian – Language Learning Journal, 1993
Presents the results of a study of the reliability of teacher assessment of the oral examination at General Certificate of Secondary Education (GCSE) for Modern Languages. Teachers' opinions of different types of marking criteria were collected through semistructured interviews after each teacher had assessed a tape-recorded free conversation task…
Descriptors: Attitude Measures, Data Collection, Evaluation Criteria, Evaluation Methods
Peer reviewedGamaroff, R. – South African Journal of Higher Education, 1998
A study investigated the consistency of criteria for academic English skills as applied by teachers of academic English and science lecturers in a South African historically black university. Both groups were asked to evaluate first-year students' essays on the greenhouse effect. Results indicated a wide variation in scores and judgments within…
Descriptors: Black Colleges, College Faculty, English (Second Language), English for Academic Purposes
Peer reviewedPenny, Alan J.; Grover, Christine – Assessment & Evaluation in Higher Education, 1996
A study of senior education majors in the United Kingdom investigated the realism of student expectations of a required independent research project. Student self-assessments and teacher assessments showed little correlation, and students' evaluation criteria emphasized lower-order skills (style and presentation) and ignored higher-order processes…
Descriptors: Cognitive Processes, College Seniors, Education Majors, Evaluation Criteria
Peer reviewedBradley, Robert H.; Corwyn, Robert F.; Caldwell, Betty M.; Whiteside-Mansell, Leanne; Mink, Iris T. – Journal of Research on Adolescence, 2000
Describes the development of the Early Adolescent version of the Home Observation for Measurement of the Environment (EA-HOME) Inventory. Presents information on its usefulness with African Americans, Chinese Americans, European Americans, Mexican Americans, and Dominican Americans. Notes findings indicating high interobserver agreement, with…
Descriptors: Black Youth, Child Development, Chinese Americans, Cultural Differences
Peer reviewedHollenbeck, Keith; Tindal, Gerald; Almond, Patricia – Educational Assessment, 1999
Studied the amount of measurement error in a state's performance-based writing task as it relates to high-stakes decision reproducibility. Using 175 eighth-grade writing samples, the study finds moderate correlations between the two raters' scores, with significant differences for the rates for the handwritten, but not the typed, essays.(SLD)
Descriptors: Decision Making, Error of Measurement, Essay Tests, Grade 8
Clariana, Roy B.; Koul, Ravinder; Salehi, Roya – International Journal of Instructional Media, 2006
This investigation seeks to confirm a computer-based approach that can be used to score concept maps (Poindexter & Clariana, 2004) and then describes the concurrent criterion-related validity of these scores. Participants enrolled in two graduate courses (n=24) were asked to read about and research online the structure and function of the heart…
Descriptors: Semantics, Human Body, Test Validity, Anatomy
Gardner, Hilary; Froud, Karen; McClelland, Alastair; van der Lely, Heather K. J. – International Journal of Language and Communication Disorders, 2006
Background: Despite a large body of evidence regarding reliable indicators of language deficits in young children, there has not been a standardized, quick screen for language impairment. The Grammar and Phonology Screening (GAPS) test was therefore designed as a short, reliable assessment of young children's language abilities. Aims: GAPS was…
Descriptors: Grammar, Phonology, Screening Tests, Reading Difficulties
Hurley, Kristin Duppong; Shaw, Tanya; Thompson, Ron; Griffith, Annette; Farmer, Elizabeth M.; Tierney, Jeff – Residential Treatment for Children & Youth, 2006
This study describes the development of the Staff Implementation Observation Form, an instrument to assess staff competence delivering an intervention to youth in group home care with behavioral or emotional disorders. This instrument assesses staff skill at implementing the key treatment components, including building relationships with youth,…
Descriptors: Residential Programs, Observation, Emotional Disturbances, Predictive Validity
Bouck, Emily C. – Education and Training in Developmental Disabilities, 2005
This study examined factors associated with the curriculum and instructional environments for secondary students with mild mental retardation, based on teacher report. A survey was mailed to 378 secondary special education teachers in Michigan. Teachers provided demographic information and answered questions regarding curriculum and instructional…
Descriptors: Special Education Teachers, Mild Mental Retardation, Secondary School Curriculum, Instructional Effectiveness
Du, Yi; And Others – 1996
In the framework of performance assessment, because of the involvement of many facets, the development of ways to detect differential item functioning or differential facet functioning (DFF) has lagged beyond the practical needs of test developers. To monitor the validity and fairness of an assessment, it is critical to discover a method that can…
Descriptors: Age Differences, Elementary School Students, Elementary Secondary Education, Essay Tests
Garrido, Mariquita; Payne, David A. – 1987
Minimum competency cut-off scores on a statistics exam were estimated under four conditions: the Angoff judging method with item data (n=20), and without data available (n=19); and the Modified Angoff method with (n=19), and without (n=19) item data available to judges. The Angoff method required free response percentage estimates (0-100) percent,…
Descriptors: Academic Standards, Comparative Analysis, Criterion Referenced Tests, Cutting Scores
Knoop, Robert; Common, Ronald W. – 1985
The Performance Review, Analysis, and Improvement System for Educators (PRAISE) is a formative evaluation instrument designed to improve the performance of school principals. The system appears to be reliable and valid and is flexible enough to accommodate the needs of a variety of schools. Sample items and categories of the instrument include…
Descriptors: Administrator Evaluation, Computer Oriented Programs, Data Interpretation, Elementary Secondary Education
Brauchle, Paul E.; And Others – 1987
The results of the first year of study of an approach in Tennessee using student performance assessments in the evaluation of teachers are summarized and the results of a field test on student performance and attitude are presented. The Tennessee approach places the responsibility for selecting and presenting the data on the teachers. About 130…
Descriptors: Academic Achievement, Attitude Measures, Career Ladders, Educational Assessment
PDF pending restorationNorth Carolina State Dept. of Public Instruction, Raleigh. Div. of Research. – 1986
This report describes the North Carolina Annual Testing Programs writing task which was administered in 1985-86. Grade six students were tested on their ability to write a clarification composition; while grade 8 students were evaluated on their skills in writing a persuasive composition. The timed composition (50 minutes) was scored by two…
Descriptors: Basic Skills, Coherence, Cohesion (Written Composition), Elementary Education
Littlefield, John H.; And Others – 1985
Sixteen Family Practice faculty members completed ratings on 59 senior medical students after a 6-week primary care clerkship. Each student was rated by seven to ten faculty members and the chief residents who worked with them, resulting in a total of 353 ratings. The rating scale covered: (1) attainment of learning objectives; (2) progress during…
Descriptors: Analysis of Variance, Clinical Experience, Confidence Testing, Evaluators

Direct link
