Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 2 |
| Since 2007 (last 20 years) | 9 |
Descriptor
| Interrater Reliability | 38 |
| Test Interpretation | 38 |
| Scoring | 13 |
| Test Reliability | 10 |
| Evaluators | 8 |
| Test Validity | 8 |
| Scores | 7 |
| Foreign Countries | 6 |
| Measurement Techniques | 6 |
| Standard Setting (Scoring) | 6 |
| Test Construction | 6 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 5 |
| Postsecondary Education | 4 |
| Early Childhood Education | 2 |
| Secondary Education | 2 |
| Elementary Education | 1 |
| Grade 3 | 1 |
| Grade 5 | 1 |
| Grade 6 | 1 |
| Grade 7 | 1 |
| Grade 8 | 1 |
| Grade 9 | 1 |
| More ▼ | |
Location
| China | 1 |
| Germany | 1 |
| Kentucky | 1 |
| South Africa | 1 |
| Sweden | 1 |
| United Kingdom | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Adult Attachment Interview | 1 |
| Bender Gestalt Test | 1 |
| Early Childhood Longitudinal… | 1 |
| Minnesota Multiphasic… | 1 |
| Program for International… | 1 |
| Self Directed Search | 1 |
| Strong Campbell Interest… | 1 |
| Trends in International… | 1 |
What Works Clearinghouse Rating
Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024
Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…
Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability
Tengberg, Michael – Language Assessment Quarterly, 2018
Reading comprehension is often treated as a multidimensional construct. In many reading tests, items are distributed over reading process categories to represent the subskills expected to constitute comprehension. This study explores (a) the extent to which specified subskills of reading comprehension tests are conceptually conceivable to…
Descriptors: Reading Tests, Reading Comprehension, Scores, Test Results
Skaggs, Gary – Measurement: Interdisciplinary Research and Perspectives, 2013
The construct map is a particularly good way to approach instrument development, and this author states that he was delighted to read Adam Wyse's thoughts about how to use construct maps for standard setting. For a number of popular standard-setting methods, Wyse shows how typical feedback to panelists fits within a construct map framework.…
Descriptors: Standard Setting (Scoring), Maps, Test Construction, Measurement
Rindermann, Heiner; Baumeister, Antonia E. E. – International Journal of Testing, 2015
Scholastic tests regard cognitive abilities to be domain-specific competences. However, high correlations between competences indicate either high task similarity or a dependence on common factors. The present rating study examined the validity of 12 Programme for International Student Assessment (PISA) and Third or Trends in International…
Descriptors: Test Validity, Test Interpretation, Competence, Reading Tests
Reed, Deborah K.; Sturges, Keith M. – Remedial and Special Education, 2013
Researchers have expressed concern about "implementation" fidelity in intervention research but have not extended that concern to "assessment" fidelity, or the extent to which pre-/posttests are administered and interpreted as intended. When studying reading interventions, data gathering heavily influences the identification of…
Descriptors: Reading Tests, Fidelity, Pretests Posttests, Intervention
Murley, Lisa D.; Stobaugh, Rebecca; Jukes, Pamela; Tassell, Janet – Educational Renaissance, 2014
The purpose of this article is to provide an overview of the process used to examine the inter-rater reliability of the Teacher Work Sample (TWS) Scoring Rubric involved with the senior culminating experience for teacher candidates used at a large comprehensive university. The study compared holistic and analytic scores reported by Student Teacher…
Descriptors: Teacher Education, Interrater Reliability, Scoring Rubrics, Preservice Teachers
Lang, W. Steve; Wilkerson, Judy R. – Online Submission, 2008
The National Council for Accreditation of Teacher Education (NCATE, 2002) requires teacher education units to develop assessment systems and evaluate both the success of candidates and unit operations. Because of a stated, but misguided, fear of statistics, NCATE fails to use accepted terminology to assure the quality of institutional evaluative…
Descriptors: State Standards, Validity, Resource Materials, Reliability
Peer reviewedBakermans-Kranenburg, Marian J; van IJzendoorn, Marinus H. – Developmental Psychology, 1993
Examined the validity of the Adult Attachment Interview (AAI) measure by interviewing 83 mothers twice over 2 months, using different interviewers on each occasion. The results indicated that the reliability of the AAI classifications was quite high over time and across interviewers. The AAI classifications were independent of nonattachment…
Descriptors: Attachment Behavior, Examiners, Interrater Reliability, Mothers
Martinez, Jose Felipe; Stecher, Brian; Borko, Hilda – Educational Assessment, 2009
In this study we use data from the Early Childhood Longitudinal Survey third- and fifth-grade samples to investigate teacher judgments of student achievement, the extent to which they offer a similar picture of student mathematics achievement compared to standardized test scores, and whether classroom assessment practices moderate the relationship…
Descriptors: Mathematics Achievement, Standardized Tests, Grade 5, Student Evaluation
Arnold, Margery E. – 1996
It is incorrect to say "the test is reliable" because reliability is a function not only of the test itself, but of many factors. The present paper explains how different factors affect classical reliability estimates such as test-retest, interrater, internal consistency, and equivalent forms coefficients. Furthermore, the limits of classical test…
Descriptors: Estimation (Mathematics), Generalizability Theory, Heuristics, Interrater Reliability
Peer reviewedVan Balen, H. G. G.; Van Limbeek, J.; De Mey, H. R. A. – International Journal of Rehabilitation Research, 1997
Forty neuropsychologists, neurologists, psychiatrists, and physiatrists identified neurologically relevant items (NRIs) in the Minnesota Multiphasic Personality Inventory-2 (MMPI-2). Raters identified four sets of NRIs: one for brain damage in general and three partially overlapping sets for stroke, traumatic brain damage, and whiplash.…
Descriptors: Clinical Diagnosis, Head Injuries, Interrater Reliability, Neurological Impairments
Amir, Tamar; Gati, Itamar; Kleiman, Tali – Journal of Career Assessment, 2008
This research develops and tests a procedure for interpreting individuals' responses in multiscale career assessments, using the Career Decision-Making Difficulties Questionnaire (CDDQ). In Study 1, criteria for ascertaining the credibility of responses were developed, based on the judgments of 39 career-counseling experts. In Study 2, the…
Descriptors: Career Choice, Decision Making Skills, Career Development, Questionnaires
Naizer, Gilbert – 1992
A measurement approach called generalizability theory (G-theory) is an important alternative to the more familiar classical measurement theory that yields less useful coefficients such as alpha or the KR-20 coefficient. G-theory is a theory about the dependability of behavioral measurements that allows the simultaneous estimation of multiple…
Descriptors: Error of Measurement, Estimation (Mathematics), Generalizability Theory, Higher Education
Shale, Doug – 1986
This study is an attempt at a cohesive characterization of the concept of essay reliability. As such, it takes as a basic premise that previous and current practices in reporting reliability estimates for essay tests have certain shortcomings. The study provides an analysis of these shortcomings--partly to encourage a fuller understanding of the…
Descriptors: Analysis of Variance, Correlation, Error of Measurement, Essay Tests
Peer reviewedVance, B.; And Others – Psychology in the Schools, 1983
Investigated the interscorer reliability between a novice and a professional psychologist for the Minnesota Percepto-Diagnostic Test-Revised (MPDT-R), using a sample of 30 individuals. Results indicated that for three of the four MPDT-R scores there was a significant positive correlation between expert and novice scoring criteria. (JAC)
Descriptors: Experimenter Characteristics, Interrater Reliability, Psychological Evaluation, Psychologists

Direct link
