Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Bobek, Becky L.; Gore, Paul A. – American College Testing (ACT), Inc., 2004
This research report describes changes made to the Inventory of Work-Relevant Values when it was revised for online use as a part of the Internet version of DISCOVER. Users will see the following differences between the online and CD-ROM versions of the inventory: 22 items rather than 61, simplified presentation, and the contribution of all items…
Descriptors: Interrater Reliability, Field Tests, Internet, Test Construction
Smith, Teresa A. – 1997
The Third International Mathematics and Science Study (TIMSS) measured mathematics and science achievement of middle school students in more than 40 countries. About one quarter of the tests' nearly 300 items were free response items requiring students to generate their own answers. Scoring these responses used a two-digit diagnostic code rubric…
Descriptors: Comparative Education, English, Error of Measurement, Foreign Countries
Peer reviewedMagill, Michael K.; And Others – Evaluation and the Health Professions, 1988
A computer-assisted interaction analysis system that describes behavior in small groups and focuses on active clinical problem-solving was developed and tested. Teacher and class characteristics are incorporated into the simulation, allowing feedback for improvement of medical teaching skills. (TJH)
Descriptors: Computer Simulation, Graduate Medical Education, Group Dynamics, Higher Education
Peer reviewedMacRae, Helen M.; And Others – Academic Medicine, 1995
A study compared two methods of rating medical students' performances on history and physical examination: using checklists completed by standardized patients (SPs) and databases completed by students. Results of each method were correlated with ratings of students by three physicians for each SP-student encounter. Results showed checklists…
Descriptors: Case Studies, Check Lists, Comparative Analysis, Databases
Peer reviewedFrederick, Brian P.; Olmi, D. Joe – Psychology in the Schools, 1994
Social interactions between children with Attention-Deficit/Hyperactivity Disorder (ADHD) and their teachers, peers, and parents are discussed. Problematic interactions may depend on social skills deficits. Changing the focus to ADHD children who are not experiencing social skills deficits may prove beneficial. A review of the previous literature…
Descriptors: Attention Deficit Disorders, Attention Span, Behavior Rating Scales, Children
Heath, Edward M.; Coleman, Karen J.; Lensegrav, Tera L.; Fallon, Jennifer A. – Research Quarterly for Exercise and Sport, 2006
The System for Observing Fitness Instruction Time (SOFIT) is a direct observation system specifically developed for use during physical education (PE; McKenzie, 1991; McKenzie, Sallis, & Nader, 1991). The purpose of this study was to validate the estimates of time spent in various physical activity intensities obtained with the paper and pencil…
Descriptors: Validity, Physical Activities, Physical Education, Physical Fitness
Rudner, Lawrence M.; Garcia, Veronica; Welch, Catherine – Journal of Technology, Learning, and Assessment, 2006
This report provides a two-part evaluation of the IntelliMetric[SM] automated essay scoring system based on its performance scoring essays from the Analytic Writing Assessment of the Graduate Management Admission Test[TM] (GMAT[TM]). The IntelliMetric system performance is first compared to that of individual human raters, a Bayesian system…
Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays
Bahadourian, Ara John; Tam, Kai Yung; Greer, R. Douglas; Rousseau, Marilyn K. – International Journal of Behavioral Consultation and Therapy, 2006
We report an experiment examining the academic performance of undergraduate students in two special education college courses. The experimenter/professor taught both courses in which he presented curriculum material via written learn units (LUs) (Greer & Hogin, 1999) or in a lecture format across randomly selected weeks in a 12-week semester.…
Descriptors: Undergraduate Students, Academic Achievement, Special Education, Education Courses
McGhee, Debbie E.; Lowell, Nana – New Directions for Teaching and Learning, 2003
This study compares mean ratings, inter-rater reliabilities, and the factor structure of items for online and paper student-rating forms from the University of Washington's Instructional Assessment System. (Contains 3 figures and 2 tables.)
Descriptors: Psychometrics, Factor Structure, Student Evaluation of Teacher Performance, Test Items
New Mexico Public Education Department, 2007
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2007 NMSBA. The 2007 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Summary of student performance; (4) Statistical analyses of item and…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring
Clariana, Roy B.; Wallace, Patricia – Journal of Educational Computing Research, 2007
This proof-of-concept investigation describes a computer-based approach for deriving the knowledge structure of individuals and of groups from their written essays, and considers the convergent criterion-related validity of the computer-based scores relative to human rater essay scores and multiple-choice test scores. After completing a…
Descriptors: Computer Assisted Testing, Multiple Choice Tests, Construct Validity, Cognitive Structures
Janosik, Steven M. – NASPA Journal, 2007
Most conversations about ethics and professional behavior involve case studies and hypothetical situations. This study identifies and examines the most common concerns in professional behavior as reported by 303 student affairs practitioners in the field. Differences by gender, years of experience, organizational level, institutional type, and…
Descriptors: Ethics, Professional Personnel, Behavior, Antisocial Behavior
Stahl, John; And Others – 1996
On-line performance assessment was developed to maximize the usefulness of performance assessment and to minimize the time and labor costs incurred. This paper reports on the development of an on-line performance assessment instrument, focusing on the establishment and validation of the scoring rubric and its implementation in the Rasch model, the…
Descriptors: Computer Software, Computer Software Development, Cost Effectiveness, Interrater Reliability
Wolfe, Edward W. – 1996
Although portfolio assessment is becoming increasingly popular, it may not survive unless portfolio scoring can meet the demands of large-scale assessment standards. The results of studies of interrater reliability with large-scale portfolio assessments have been mixed. This paper reports the scoring results of a nationwide portfolio pilot in…
Descriptors: Decision Making, Generalizability Theory, Interrater Reliability, Language Arts
Tiffany, Gerald E.; And Others – 1991
In 1991, a student learning outcomes assessment was conducted at Wenatchee Valley College, Washington. All English 101 students in the winter and spring quarters of 1990 wrote a 2-hour final exam. Winter quarter students wrote on the same topic while spring quarter students wrote on one of three randomly assigned topics. Five English 101…
Descriptors: Community Colleges, Comparative Analysis, Curriculum Evaluation, Essay Tests

Direct link
