Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Talbott, Meagan R.; Dufek, Sarah; Young, Greg; Rogers, Sally J. – Autism: The International Journal of Research and Practice, 2022
This study investigated the feasibility of recruiting and assessing infants with prodromal autism characteristics in the first year of life via telehealth. Participants included 41 infants (Mage = 10.51 months, 51.2% female, 80.5% White) whose parents had concerns about social communication delays or autism. All infants met concerns criteria on a…
Descriptors: Infants, Autism Spectrum Disorders, At Risk Persons, Symptoms (Individual Disorders)
Arslan Mancar, Sinem; Gulleroglu, H. Deniz – International Journal of Assessment Tools in Education, 2022
The aim of this study is to analyse the importance of the number of raters and compare the results obtained by techniques based on Classical Test Theory (CTT) and Generalizability (G) Theory. The Kappa and Krippendorff alpha techniques based on CTT were used to determine the inter-rater reliability. In this descriptive research data consists of…
Descriptors: Comparative Analysis, Interrater Reliability, Advanced Placement, Scoring Rubrics
Hassan Saleh Mahdi; Ahmed Alkhateeb – International Journal of Computer-Assisted Language Learning and Teaching, 2025
This study aims to develop a robust rubric for evaluating artificial intelligence (AI)--assisted essay writing in English as a Foreign Language (EFL) contexts. Employing a modified Delphi technique, we conducted a comprehensive literature review and administered Likert scale questionnaires. This process yielded nine key evaluation criteria,…
Descriptors: Scoring Rubrics, Essays, Writing Evaluation, Artificial Intelligence
Chen, Zhen; Fang, Rui; Zhang, Yi; Ge, Pingjiang; Zhuang, Peiyun; Chou, Adriana; Jiang, Jack – Journal of Speech, Language, and Hearing Research, 2018
Purpose: The purpose of this study is to develop the Mandarin version of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) and evaluate its reliability compared with the Grade, Roughness, Breathiness, Asthenia, Strain (GRBAS). Method: The Mandarin version of the CAPE-V tool was translated from the validated English version with…
Descriptors: Voice Disorders, Diagnostic Tests, Mandarin Chinese, Test Reliability
D'Agostino, Jerome V.; Rodgers, Emily; Winkler, Christa; Johnson, Tracy; Berenbon, Rebecca – Reading Psychology, 2021
Running Records provide a standardized method for recording and assessing students' oral reading behaviors and are excellent formative assessment tools to guide instructional decision-making. This study expands on prior Running Record reliability work by evaluating the extent to which external raters and teachers consistently assessed students'…
Descriptors: Accuracy, Oral Reading, Generalizability Theory, Error Correction
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Flake, Jessica Kay; Petway, Kevin Terrance, II – Educational Measurement: Issues and Practice, 2019
Numerous studies merely note divergence in students' and teachers' ratings of student noncognitive constructs. However, given the increased attention and use of these constructs in educational research and practice, an in-depth study focused on this issue was needed. Using a variety of quantitative methodologies, we thoroughly investigate…
Descriptors: Teachers, Students, Achievement Rating, Interrater Reliability
Atilgan, Hakan – Eurasian Journal of Educational Research, 2019
Purpose: This study intended to examine the generalizability and reliability of essay ratings within the scope of the generalizability (G) theory. Specifically, the effect of raters on the generalizability and reliability of students' essay ratings was examined. Furthermore, variations of the generalizability and reliability coefficients with…
Descriptors: Foreign Countries, Essay Tests, Test Reliability, Interrater Reliability
Volante, Paulo; Valenzuela, Sergio; Díaz, Alejandro; Fernández, Magdalena; Mladinic, Antonio – School Leadership & Management, 2019
This study seeks to develop and validate an Assessment Centre (AC) tool for the evaluation and selection of school leaders, focusing on the identification of competencies that influence teaching and learning outcomes. International research supports the creation of Assessment Centres to select candidates for these roles, due to their superior…
Descriptors: Foreign Countries, Personnel Selection, School Administration, Assessment Centers (Personnel)
Miller, Matthew B.; Jimenez-Garcia, John Alexander; Hong, Chang Ki; DeMont, Richard – Measurement in Physical Education and Exercise Science, 2020
The Child-Focused Injury Risk Screening Tool (ChildFIRST) is a process-based assessment including 10 movement skills with 4 associated evaluation criteria. The ChildFIRST has been validated by a group of experts to evaluate movement competence and injury risk in 8-12-year-olds. The purpose of this study is to evaluate the reliability of the…
Descriptors: Screening Tests, Risk Assessment, Injuries, Psychomotor Skills
Polat, Murat – International Online Journal of Education and Teaching, 2020
The assessment of speaking skills in foreign language testing has always had some pros (testing learners' speaking skills doubles the validity of any language test) and cons (many testrelevant/irrelevant variables interfere) since it is a multi-dimensional process. In the meantime, exploring grader behaviours while scoring learners' speaking…
Descriptors: Item Response Theory, Interrater Reliability, Speech Skills, Second Language Learning
Lichtenstein, Robert – Communique, 2020
Appropriate interpretation of assessment data requires an appreciation that tools are subject to measurement error. School psychologists recognize, at least on an intellectual level, that measures are imperfect--that test scores and other quantitative measures (e.g., rating scales, systematic behavioral observations) are best estimates of…
Descriptors: Error of Measurement, Test Reliability, Pretests Posttests, Standardized Tests
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Siqi Huang – North American Chapter of the International Group for the Psychology of Mathematics Education, 2023
The goal of this paper is twofold. First, the paper clarifies and elaborates on an important theoretical construct called orientation with respect to understanding in mathematics, which denotes the degree to which students exhibit an inclination towards and demonstrate an earnest concern for understanding in mathematical learning. Second, the…
Descriptors: Mathematics Instruction, Teaching Methods, Problem Solving, Reliability
Nuñez-Polo, Mercedes H. – Journal of Mental Health Research in Intellectual Disabilities, 2022
Introduction: The aim of this study is to validate a Spanish version of the Impact of Event Scale on People with ID (IES-ID). Methods: IES-ID was administered to adults with ID (n = 120), analyzing internal consistency, inter-rater and test-retest reliability, criterion validity, construct validity and feasibility. Results: Good internal…
Descriptors: Spanish, Translation, Construct Validity, Factor Analysis

Peer reviewed
Direct link
