Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 3 |
| Since 2017 (last 10 years) | 5 |
| Since 2007 (last 20 years) | 19 |
Descriptor
| Interrater Reliability | 68 |
| Testing | 68 |
| Language Tests | 28 |
| Foreign Countries | 22 |
| Second Language Learning | 21 |
| English (Second Language) | 18 |
| Rating Scales | 16 |
| Test Reliability | 16 |
| Comparative Analysis | 15 |
| Scoring | 15 |
| Test Validity | 15 |
| More ▼ | |
Source
Author
| Grant, Leslie | 2 |
| McNamara, T. F. | 2 |
| Nakamura, Yuji | 2 |
| Stewart, Krista J. | 2 |
| Adams, R. J. | 1 |
| Alderson, J. Charles | 1 |
| Awadalla, Nardeen | 1 |
| Baker, Beverly A. | 1 |
| Barnwell, David | 1 |
| Boris, Ashley L. | 1 |
| Botting, Nicola | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 8 |
| Postsecondary Education | 8 |
| Grade 6 | 2 |
| Grade 7 | 2 |
| Grade 8 | 2 |
| High Schools | 2 |
| Middle Schools | 2 |
| Secondary Education | 2 |
| Early Childhood Education | 1 |
| Elementary Education | 1 |
| Grade 3 | 1 |
| More ▼ | |
Audience
| Practitioners | 5 |
| Teachers | 3 |
| Policymakers | 1 |
| Researchers | 1 |
Location
| Canada | 3 |
| Arizona | 2 |
| Japan | 2 |
| United Kingdom (England) | 2 |
| United Kingdom (Great Britain) | 2 |
| Australia | 1 |
| California | 1 |
| China | 1 |
| China (Beijing) | 1 |
| Hungary (Budapest) | 1 |
| New York | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 1 |
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Saenz, David Arron – Online Submission, 2023
There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…
Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021
Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023
Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…
Descriptors: Chemistry, Periodicals, Journal Articles, Science Education
Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019
Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…
Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation
Boris, Ashley L.; Awadalla, Nardeen; Martin, Toby L.; Martin, Garry L.; Kaminski, Lauren; Miljkovic, Morena – Education and Training in Autism and Developmental Disabilities, 2015
The Assessment of Basic Learning Abilities (ABLA) is a tool that is used to assess the learning ability of individuals with intellectual disability (ID) and children with autism. The ABLA was recently revised and is now referred to as the ABLA-Revised (ABLA-R). A self-instructional manual was prepared to teach individuals how to administer the…
Descriptors: Guides, Academic Ability, Intellectual Disability, Autism
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Han, Chao – Language Assessment Quarterly, 2016
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
Descriptors: Foreign Countries, Scores, English, Chinese
Reed, Deborah K.; Sturges, Keith M. – Remedial and Special Education, 2013
Researchers have expressed concern about "implementation" fidelity in intervention research but have not extended that concern to "assessment" fidelity, or the extent to which pre-/posttests are administered and interpreted as intended. When studying reading interventions, data gathering heavily influences the identification of…
Descriptors: Reading Tests, Fidelity, Pretests Posttests, Intervention
Darsaklis, Vasiliki; Snider, Laurie M.; Majnemer, Annette; Mazer, Barbara – Physical & Occupational Therapy in Pediatrics, 2013
This study examined the constructs underlying the Movement Assessment Battery for Children-2 (M-ABC-2), Bruninks-Oseretsky Test of Motor Proficiency (BOTMP) and Vineland Adaptive Behavior Scale-2 (VABS-2) using the framework of the International Classification of Functioning Disability and Health--Child Youth version (ICF-CY) and the diagnostic…
Descriptors: Adjustment (to Environment), Motor Development, Children, Developmental Disabilities
Crossley, Scott; Clevinger, Amanda; Kim, YouJin – Language Assessment Quarterly, 2014
There has been a growing interest in the use of integrated tasks in the field of second language testing to enhance the authenticity of language tests. However, the role of text integration in test takers' performance has not been widely investigated. The purpose of the current study is to examine the effects of text-based relational (i.e.,…
Descriptors: Language Proficiency, Connected Discourse, Language Tests, English (Second Language)
Lu, Chia-Chen; Luh, Ding-Bang – Creativity Research Journal, 2012
Although previous studies have attempted to use different experiences of raters to rate product creativity by adopting the Consensus Assessment Method (CAT) approach, the validity of replacing CAT with another measurement tool has not been adequately tested. This study aimed to compare raters with different levels of experience (expert ves.…
Descriptors: Creativity, Interrater Reliability, Construct Validity, Comparative Analysis
Hasson, Natalie; Dodd, Barbara; Botting, Nicola – International Journal of Language & Communication Disorders, 2012
Background: Sentence construction and syntactic organization are known to be poor in children with specific language impairments (SLI), but little is known about the way in which children with SLI approach language tasks, and static standardized tests contribute little to the differentiation of skills within the population of children with…
Descriptors: Alternative Assessment, Sentence Structure, Syntax, Language Processing
Zhao, Zhongbao – RELC Journal: A Journal of Language Teaching and Research, 2013
This study investigates the validity of the Diagnostic College English Speaking Test (DCEST) in the context of EFL teaching and learning in China. The experiment was conducted in three stages over the course of eight weeks at a national key university in China. By means of test administration and questionnaire survey, the researcher gathered…
Descriptors: Oral Language, Construct Validity, Language Tests, Diagnostic Tests
Baker, Beverly A. – Assessing Writing, 2010
In high-stakes writing assessments, rater training in the use of a rating scale does not eliminate variability in grade attribution. This realisation has been accompanied by research that explores possible sources of rater variability, such as rater background or rating scale type. However, there has been little consideration thus far of…
Descriptors: Foreign Countries, Writing Evaluation, Writing Tests, Testing
Gerlick, Robert Edward – ProQuest LLC, 2010
The research presented in this manuscript was focused on the development of assessments for engineering design outcomes. The primary goal was to support efforts by the Transferrable Integrated Design Engineering Education (TIDEE) consortium in developing assessment instruments for multidisciplinary engineering capstone courses. Research conducted…
Descriptors: Engineering Education, Student Evaluation, Formative Evaluation, Testing

Peer reviewed
Direct link
