Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Pfeiffer, Steven; Petscher, Yaacov; Kumtepe, Alper – Roeper Review, 2008
This study examined the internal consistency and validity of a new rating scale to identify gifted students, the Gifted Rating Scales-School Form (GRS-S). The study explored the effect of gender, race/ethnicity, age, and rater familiarity on GRS-S ratings. One hundred twenty-two students in first to eighth grade from elementary and middle schools…
Descriptors: Ethnicity, Middle Schools, Academically Gifted, Talent
Roberts, Felicia; Cimasko, Tony – Journal of Second Language Writing, 2008
This study addresses the response of social science and engineering science faculty to a naturally occurring sample of second language writing. Using a matched-guise protocol, faculty participants were led to believe that the one-page essay was produced by an international student whose first language was either Chinese or Spanish. The faculty…
Descriptors: Foreign Students, Writing (Composition), Semantics, Social Sciences
Grainger, Peter; Purnell, Ken; Zipf, Reyna – Assessment & Evaluation in Higher Education, 2008
Decisions by markers about quality in student work remain confusing to most students and markers. This may in part be due to the relatively subjective nature of what constitutes a quality response to an assessment task. This paper reports on an experiment that documented the process of decision-making by multiple markers at a university who…
Descriptors: Student Evaluation, Educational Quality, Achievement Rating, Interrater Reliability
Yang, Ya-Ting C.; Chan, Chia-Ying – Computers & Education, 2008
This study aimed to develop a set of evaluation criteria for English learning websites. These criteria can assist English teachers/web designers in designing effective websites for their English courses and can also guide English learners in screening for appropriate and reliable websites to use in increasing their English ability. To fulfill our…
Descriptors: Speech Communication, Content Validity, Interrater Reliability, Second Language Learning
D'Eon, Marcel; Sadownik, Leslie; Harrison, Alexandra; Nation, Jill – American Journal of Evaluation, 2008
An accepted gold standard for measuring change in participant behavior is third-party observation. This method is highly resource intensive, and many small-scale evaluations may not be in a position to use this approach. This study was designed to assess the validity and reliably of aggregated group self-assessments as one way to measure workshop…
Descriptors: Program Effectiveness, Workshops, Feedback (Response), Self Evaluation (Groups)
Cook, David A.; Beckman, Thomas J. – Advances in Health Sciences Education, 2009
Educators must often decide how many points to use in a rating scale. No studies have compared interrater reliability for different-length scales, and few have evaluated accuracy. This study sought to evaluate the interrater reliability and accuracy of mini-clinical evaluation exercise (mini-CEX) scores, comparing the traditional mini-CEX…
Descriptors: Interrater Reliability, Rating Scales, Internal Medicine, Test Validity
Quigg, Mark; Lado, Fred A. – Journal of Continuing Education in the Health Professions, 2009
Introduction: The Accreditation Council for Continuing Medical Education (ACCME) provides guidelines for continuing medical education (CME) materials to mitigate problems in the independence or validity of content in certified activities; however, the process of peer review of materials appears largely unstudied and the reproducibility of…
Descriptors: Medical Education, Physicians, Conflict of Interest, Interrater Reliability
Pell, Godfrey; Homer, Matthew S.; Roberts, Trudie E. – International Journal of Research & Method in Education, 2008
Increasingly, academic institutions are being required to improve the validity of the assessment process; unfortunately, often this is at the expense of reliability. In medical schools (such as Leeds), standardized tests of clinical skills, such as "Objective Structured Clinical Examinations" (OSCEs) are widely used to assess clinical…
Descriptors: Medical Education, Standardized Tests, Clinical Experience, Criterion Referenced Tests
Clark, Douglas B.; Sampson, Victor – Journal of Research in Science Teaching, 2008
The national science standards, along with prominent researchers, call for increased focus on scientific argumentation in the classroom. Over the past decade, researchers have developed sophisticated online science learning environments to support these opportunities for scientific argumentation. Assessing the quality of dialogic argumentation,…
Descriptors: Persuasive Discourse, Interrater Reliability, Concept Formation, Discourse Modes
Wang, Hao-Chuan; Chang, Chun-Yen; Li, Tsai-Yen – Computers & Education, 2008
The work aims to improve the assessment of creative problem-solving in science education by employing language technologies and computational-statistical machine learning methods to grade students' natural language responses automatically. To evaluate constructs like creative problem-solving with validity, open-ended questions that elicit…
Descriptors: Interrater Reliability, Earth Science, Problem Solving, Grading
Shriberg, David; Bonner, Mike; Sarr, Brianna J.; Walker, Ashley Marks; Hyland, Megan; Chester, Christie – School Psychology Review, 2008
Social justice is an aspiration that most, if not all, school psychologists likely support, yet there is a lack of research delineating how this term translates to school psychology practice. This article presents the results of a Delphi study of 44 cultural diversity experts in school psychology regarding (a) defining social justice from a school…
Descriptors: Social Justice, Delphi Technique, School Psychologists, Cultural Pluralism
Burdsal, Charles A.; Harrison, Paul D. – Assessment & Evaluation in Higher Education, 2008
The purpose of this research is to provide additional empirical evidence supporting the use of both a multidimensional profile and an overall evaluation of teaching effectiveness as valid indicators of student perceptions of effective classroom instruction. A factor analytic teaching evaluation instrument was used that also included open-ended…
Descriptors: Student Evaluation of Teacher Performance, Factor Analysis, Profiles, Multidimensional Scaling
Prathanee, Benjamas; Pongjanyakul, Amornrat; Chano, Jiraporn – International Journal of Language & Communication Disorders, 2008
Background: Children with delayed speech and language development are at considerable risk for later language impairment, social and behavioural problems, and illiteracy. Early diagnosis is needed for intervention planning and prevention. However, a speech and language test for Thai children has not been available. Aims: To establish a Thai Speech…
Descriptors: Delayed Speech, Language Impairments, Language Tests, Interrater Reliability
Liow, Jong-Leng – European Journal of Engineering Education, 2008
Peer assessment has been studied in various situations and actively pursued as a means by which students are given more control over their learning and assessment achievement. This study investigated the reliability of staff and student assessments in two oral presentations with limited feedback for a school-based thesis course in engineering…
Descriptors: Feedback (Response), Student Evaluation, Grade Point Average, Peer Evaluation
Crawford, Lindy; Lloyd, Susan; Knoth, Kelly – Assessment for Effective Intervention, 2008
Type and quality of revisions made by students between first and final drafts of a state writing test were scored using a revision taxonomy. Scorers categorized revisions first by unit (e.g., word, phrase, sentence), and then by type (e.g., addition, substitution, spelling). They then evaluated the impact of each revision on the readability of the…
Descriptors: Writing Tests, Revision (Written Composition), State Standards, Writing Evaluation

Peer reviewed
Direct link
