Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2015
Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not random, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of…
Descriptors: Interrater Reliability, Monte Carlo Methods, Measurement Techniques, Accuracy
Kahraman, Nilufer; Brown, Crystal B. – Applied Measurement in Education, 2015
Psychometric models based on structural equation modeling framework are commonly used in many multiple-choice test settings to assess measurement invariance of test items across examinee subpopulations. The premise of the current article is that they may also be useful in the context of performance assessment tests to test measurement invariance…
Descriptors: Factor Analysis, Structural Equation Models, Medical Students, Performance Based Assessment
Kouo, Jennifer Lee – Focus on Autism and Other Developmental Disabilities, 2019
Deficits in social communication and interaction have been identified as distinguishing impairments for individuals with an autism spectrum disorder (ASD). As a pivotal skill, the successful development of social communication and interaction in individuals with ASD is a lifelong objective. Point-of-view video modeling (VM) has the potential to…
Descriptors: Interpersonal Competence, Autism, Pervasive Developmental Disorders, Video Technology
National Council on Teacher Quality, 2023
Up until 2020, National Assessment of Educational Progress (NAEP) reading scores had increased only slightly since the early 1990s with large achievement gaps for students of color and students living in poverty. Modest gains in fourth grade reading proficiency since 1992 were erased during the pandemic. The insufficient progress in reading even…
Descriptors: National Competency Tests, Reading Achievement, Reading Instruction, Scores
Lavesson, Ann; Lövdén, Martin; Hansson, Kristina – International Journal of Language & Communication Disorders, 2018
Background: The Swedish Program for health surveillance of preschool children includes screening of language and communication abilities. One important language screening is carried out at age 4 years as part of a general screening conducted by health nurses at child health centres. The instruments presently in use for this screening mainly focus…
Descriptors: Preschool Children, Language Impairments, Semantics, Allied Health Personnel
Eldar, Eitan; Ayvazo, Shiri; Hirschmann, Michal – Journal of International Special Needs Education, 2018
Classroom management still remains a topic of major apprehension for teachers, and especially for those teaching students who display challenging behaviors. This paper presents an empirical examination that supplemented an exceptional project of the ministry of education in a small Middle-East country to support students with severe problem…
Descriptors: Classroom Techniques, Student Behavior, Behavior Disorders, Self Contained Classrooms
van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018
In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…
Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills
Splett, Joni W.; Smith-Millman, Marissa; Raborn, Anthony; Brann, Kristy L.; Flaspohler, Paul D.; Maras, Melissa A. – School Psychology Quarterly, 2018
The current study examined between-teacher variance in teacher ratings of student behavioral and emotional risk to identify student, teacher and classroom characteristics that predict such differences and can be considered in future research and practice. Data were taken from seven elementary schools in one school district implementing universal…
Descriptors: Student Behavior, Risk, Behavior Problems, Emotional Problems
Cato, Heather; Walker, Katie – Journal of Language and Literacy Education, 2022
Standardized testing and accountability are currently unavoidable components of Texas Public Education. Through years of push-back, parents and educators have demanded that Texas consider alternative testing options that would reduce the high-stakes testing burden on students and schools. In 2015, the State of Texas passed legislation requiring…
Descriptors: Writing Evaluation, Writing Instruction, Pedagogical Content Knowledge, State Legislation
Wu, Siew Mei; Tan, Susan – Higher Education Research and Development, 2016
Rating essays is a complex task where students' grades could be adversely affected by test-irrelevant factors such as rater characteristics and rating scales. Understanding these factors and controlling their effects are crucial for test validity. Rater behaviour has been extensively studied through qualitative methods such as questionnaires and…
Descriptors: Scoring, Item Response Theory, Student Placement, College Students
Conati, Cristina; Gutica, Mirela – International Journal of Artificial Intelligence in Education, 2016
We present the results of a study that explored the emotions experienced by students during interaction with an educational game for math (Heroes of Math Island). Starting from emotion frameworks in affective computing and education, we considered a larger set of emotions than in related research. For emotion labeling, we started from a standard…
Descriptors: Educational Games, Emotional Response, Evaluators, Interrater Reliability
Loukina, Anastassia; Buzick, Heather – ETS Research Report Series, 2017
This study is an evaluation of the performance of automated speech scoring for speakers with documented or suspected speech impairments. Given that the use of automated scoring of open-ended spoken responses is relatively nascent and there is little research to date that includes test takers with disabilities, this small exploratory study focuses…
Descriptors: Automation, Scoring, Language Tests, Speech Tests
Welch, Adam C.; Karpen, Samuel C.; Cross, L. Brian; LeBlanc, Brandie N. – Research & Practice in Assessment, 2017
The aims of this study were to determine faculty's ability to accurately and reliably categorize exam questions using Bloom's Taxonomy, and if modified versions would improve the accuracy and reliability. Faculty experience and affiliation with a health sciences discipline were also considered. Faculty at one university were asked to categorize 30…
Descriptors: College Faculty, Medical School Faculty, Health Sciences, Test Items
Mayton, Michael R.; Zhang, Jie; Carter, Stacy L.; Suppo, Jennifer L. – Journal of Research in Special Educational Needs, 2017
How well doctoral students in special education are prepared to evaluate research as evidence-based practice (EBP) is likely to impact their careers, as well as the teachers they will train. In developing a method for evaluating the readiness of small cohort groups of doctoral students to apply a research-based model of EBP, an instrument and…
Descriptors: Special Education, Doctoral Programs, Graduate Students, Readiness
Morgan, Paul L.; Farkas, George; Cook, Michael; Strassfeld, Natasha M.; Hillemeier, Marianne M.; Pun, Wik Hung; Schussler, Deborah L. – Exceptional Children, 2017
We synthesized empirical work to evaluate whether Black children are disproportionately overrepresented in special education. We identified 22 studies that met a priori inclusion criteria including use of at least 1 covariate in the reported analyses. Evidence of overrepresentation declined markedly as the studies included one or more of 3…
Descriptors: African American Students, Children, Disproportionate Representation, Special Education

Peer reviewed
Direct link
