Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Dike, Shelley E.; Kochan, Frances K.; Reed, Cynthia; Ross, Margaret – International Journal of Leadership in Education, 2006
This study gathered information regarding professional military educators' perceptions of the concept of critical thinking to determine whether there was a common definition and a shared meaning of the concept among them. Although there did not appear to be a common definition of critical thinking among this group, 10 categories and four themes…
Descriptors: Critical Thinking, Concept Formation, Military Personnel, Higher Education
Kang, Sang-Jo; Kang, Minsoo – Measurement in Physical Education and Exercise Science, 2006
In many countries, an athlete's performance at sporting competitions is often used as part of the selection criteria for entry into college. These criteria could be biased depending upon the procedures utilized by the authorities in a particular country. The purpose of this study was to calibrate, by using the Rasch rating scale model, the…
Descriptors: Athletes, Rating Scales, Weighted Scores, Judges
Noreau, Luc; Lepage, Celine; Boissiere, Lucie; Picard, Roger; Fougeyrollas, Patrick; Mathieu, Jean; Desmarais, Gilbert; Nadeau, Line – Developmental Medicine & Child Neurology, 2007
The objectives of this study were: (1) to examine the psychometric properties of the Assessment of Life Habits (LIFE-H) for children; and (2) to draw a profile of the level of participation among children of 5 to 13 years of age with various impairments. The research team adapted the adult version of the LIFE-H in order to render it more…
Descriptors: Genetic Disorders, Head Injuries, Neurological Impairments, Measurement Techniques
Meier, Anne; Spada, Hans; Rummel, Nikol – International Journal of Computer-Supported Collaborative Learning, 2007
The analysis of the process of collaboration is a central topic in current CSCL research. However, defining process characteristics relevant for collaboration quality and developing instruments capable of assessing these characteristics are no trivial tasks. In the assessment method presented in this paper, nine qualitatively defined dimensions of…
Descriptors: Interrater Reliability, Cooperation, Content Analysis, Cognitive Processes
Murdock, Linda C.; Cost, Hollie C.; Tieso, Carol – Focus on Autism and Other Developmental Disabilities, 2007
The "Social-Communication Assessment Tool" (S-CAT) was created as a direct observation instrument to quantify specific social and communication deficits of children with autism spectrum disorders (ASD) within educational settings. In this pilot study, the instrument's content validity and interrater reliability were investigated to determine the…
Descriptors: Nonverbal Communication, Autism, Content Validity, Test Validity
Erkens, Gijsbert; Janssen, Jeroen – International Journal of Computer-Supported Collaborative Learning, 2008
Although protocol analysis can be an important tool for researchers to investigate the process of collaboration and communication, the use of this method of analysis can be time consuming. Hence, an automatic coding procedure for coding dialogue acts was developed. This procedure helps to determine the communicative function of messages in online…
Descriptors: Protocol Analysis, Validity, Cooperation, Coding
Du, Yi; And Others – 1997
The FACETS equating model meets the complex requirements for equating writing performance assessment across both raters and prompts. This study is based on an equating of the 1996 writing performance assessment in the Minneapolis Public Schools (Minnesota). Raters and prompts were equated simultaneously using the FACETS model. About 3,000 fifth…
Descriptors: Elementary Education, Elementary School Students, Equated Scores, Grade 5
Raymond, Mark R.; Viswesvaran, Chockalingam – 1991
This study illustrates the use of three least-squares models to control for rater effects in performance evaluation: (1) ordinary least squares (OLS); (2) weighted least squares (WLS); and (3) OLS subsequent to applying a logistic transformation to observed ratings (LOG-OLS). The three models were applied to ratings obtained from four…
Descriptors: Evaluators, Higher Education, Interrater Reliability, Least Squares Statistics
Naizer, Gilbert – 1992
A measurement approach called generalizability theory (G-theory) is an important alternative to the more familiar classical measurement theory that yields less useful coefficients such as alpha or the KR-20 coefficient. G-theory is a theory about the dependability of behavioral measurements that allows the simultaneous estimation of multiple…
Descriptors: Error of Measurement, Estimation (Mathematics), Generalizability Theory, Higher Education
Bachman, Lyle F.; And Others – 1993
This paper outlines the development of a performance assessment measure of language speaking ability, the Language Ability Assessment System (LAAS), which is highly reliable and can be examined for reliability through modern measurement theories, such as generalizability theory (G-theory) and the many-facet Rasch theory. LAAS was developed to…
Descriptors: College Students, Higher Education, Interrater Reliability, Language Proficiency
Collins, Angelo – 1990
Since 1986, the Teacher Assessment Project (TAP) at Stanford University (California) has been exploring performance-based modes of assessment that capture the complexity of the practice of teaching. After a brief description of the rating procedures, the raters, and the situated-performances designed by the TAP for assessment, this paper describes…
Descriptors: Biology, Comparative Analysis, Evaluators, High Schools
Llabre, Maria M.; Forgan, Harry W. – Florida Journal of Educational Research, 1985
The interrater reliability and factor structure of colleague ratings of university faculty were studied for 46 faculty members from 4 departments within the School of Education and Allied Professions at the University of Miami (Florida). Within each department, each faculty member rated every other faculty member using two methods: (1) a global…
Descriptors: College Faculty, Evaluation Methods, Factor Analysis, Higher Education
Clark, John L. D. – 1986
A study of the reliability of the proficiency ratings scale and techniques used by three federal government agencies--the Central Intelligence Agency, the Defense Language Institute, and the Foreign Service Institute (FSI)--to test employees' oral language proficiency in French and German had two randomly selected two-person teams of testers from…
Descriptors: Comparative Analysis, Federal Government, French, German
Stansfield, Charles W.; Kenyon, Dorry Mann – 1988
The development and validation of a Portuguese oral language test are described. The test consisted of five item types: personal conversation, giving directions, description of picture sequences, topical discourse, and oral task completion based on printed instructions. Three preliminary forms of the test were administered to a group of language…
Descriptors: Interrater Reliability, Interviews, Language Tests, Oral Language
Santmire, Toni E. – 1984
The purpose of this paper is to discuss ways in which developmental psychology suffers from the lack of an appropriate technology of measurement and statistical analysis. The paper begins by noting that developmental psychology is the study of change; that individuals develop through a succession of "stages" which are separated by…
Descriptors: Data Analysis, Data Collection, Developmental Psychology, Developmental Stages

Peer reviewed
Direct link
