Publication Date
| In 2026 | 0 |
| Since 2025 | 52 |
| Since 2022 (last 5 years) | 410 |
| Since 2017 (last 10 years) | 913 |
| Since 2007 (last 20 years) | 1964 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Researchers | 93 |
| Practitioners | 23 |
| Teachers | 22 |
| Policymakers | 10 |
| Administrators | 5 |
| Students | 4 |
| Counselors | 2 |
| Parents | 2 |
| Community | 1 |
Location
| United States | 47 |
| Germany | 42 |
| Australia | 34 |
| Canada | 27 |
| Turkey | 27 |
| California | 22 |
| United Kingdom (England) | 20 |
| Netherlands | 18 |
| China | 17 |
| New York | 15 |
| United Kingdom | 15 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
Hung-Yu Huang – Educational and Psychological Measurement, 2025
The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent…
Descriptors: Response Style (Tests), Psychological Characteristics, Item Response Theory, Test Reliability
Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…
Descriptors: Value Added Models, Tests, Testing, Scoring
Stephanie M. Bell; R. Philip Chalmers; David B. Flora – Educational and Psychological Measurement, 2024
Coefficient omega indices are model-based composite reliability estimates that have become increasingly popular. A coefficient omega index estimates how reliably an observed composite score measures a target construct as represented by a factor in a factor-analysis model; as such, the accuracy of omega estimates is likely to depend on correct…
Descriptors: Influences, Models, Measurement Techniques, Reliability
Alexander Robitzsch; Oliver Lüdtke – Measurement: Interdisciplinary Research and Perspectives, 2024
Educational large-scale assessment (LSA) studies like the program for international student assessment (PISA) provide important information about trends in the performance of educational indicators in cognitive domains. The change in the country means in a cognitive domain like reading between two successive assessments is an example of a trend…
Descriptors: Secondary School Students, Foreign Countries, International Assessment, Achievement Tests
Pornphan Sureeyatanapas; Panitas Sureeyatanapas; Uthumporn Panitanarak; Jittima Kraisriwattana; Patchanan Sarootyanapat; Daniel O'Connell – Language Testing in Asia, 2024
Ensuring consistent and reliable scoring is paramount in education, especially in performance-based assessments. This study delves into the critical issue of marking consistency, focusing on speaking proficiency tests in English language learning, which often face greater reliability challenges. While existing literature has explored various…
Descriptors: Foreign Countries, Students, English Language Learners, Speech
Samuel J. Howarth; Erinn McCreath Frangakis; Steven Hirsch; Diana De Carvalho – Measurement in Physical Education and Exercise Science, 2024
The flexion relaxation ratio (FRR) of the lumbar extensor muscles is often assessed in experimental and clinical studies. This study evaluated within- and between-session test--retest reliability and measurement error for different FRR formulations. Participants completed two identical data collection sessions 1-week apart. Spine flexion and…
Descriptors: Exercise Physiology, Human Body, Pretests Posttests, Error of Measurement
Shunji Wang; Katerina M. Marcoulides; Jiashan Tang; Ke-Hai Yuan – Structural Equation Modeling: A Multidisciplinary Journal, 2024
A necessary step in applying bi-factor models is to evaluate the need for domain factors with a general factor in place. The conventional null hypothesis testing (NHT) was commonly used for such a purpose. However, the conventional NHT meets challenges when the domain loadings are weak or the sample size is insufficient. This article proposes…
Descriptors: Hypothesis Testing, Error of Measurement, Comparative Analysis, Monte Carlo Methods
Xijuan Zhang; Hao Wu – Structural Equation Modeling: A Multidisciplinary Journal, 2024
A full structural equation model (SEM) typically consists of both a measurement model (describing relationships between latent variables and observed scale items) and a structural model (describing relationships among latent variables). However, often researchers are primarily interested in testing hypotheses related to the structural model while…
Descriptors: Structural Equation Models, Goodness of Fit, Robustness (Statistics), Factor Structure
Hoang V. Nguyen; Niels G. Waller – Educational and Psychological Measurement, 2024
We conducted an extensive Monte Carlo study of factor-rotation local solutions (LS) in multidimensional, two-parameter logistic (M2PL) item response models. In this study, we simulated more than 19,200 data sets that were drawn from 96 model conditions and performed more than 7.6 million rotations to examine the influence of (a) slope parameter…
Descriptors: Monte Carlo Methods, Item Response Theory, Correlation, Error of Measurement
A. E. Ades; Nicky J. Welton; Sofia Dias; David M. Phillippo; Deborah M. Caldwell – Research Synthesis Methods, 2024
Network meta-analysis (NMA) is an extension of pairwise meta-analysis (PMA) which combines evidence from trials on multiple treatments in connected networks. NMA delivers internally consistent estimates of relative treatment efficacy, needed for rational decision making. Over its first 20 years NMA's use has grown exponentially, with applications…
Descriptors: Network Analysis, Meta Analysis, Medicine, Clinical Experience
Ethan R. Van Norman; David A. Klingbeil; Adelle K. Sturgell – Grantee Submission, 2024
Single-case experimental designs (SCEDs) have been used with increasing frequency to identify evidence-based interventions in education. The purpose of this study was to explore how several procedural characteristics, including within-phase variability (i.e., measurement error), number of baseline observations, and number of intervention…
Descriptors: Research Design, Case Studies, Effect Size, Error of Measurement
Ting Dai; Yang Du; Jennifer Cromley; Tia Fechter; Frank Nelson – Journal of Experimental Education, 2024
Simple matrix sampling planned missing (SMS PD) design, introduce missing data patterns that lead to covariances between variables that are not jointly observed, and create difficulties for analyses other than mean and variance estimations. Based on prior research, we adopted a new multigroup confirmatory factor analysis (CFA) approach to handle…
Descriptors: Research Problems, Research Design, Data, Matrices
Natalja Menold; Vera Toepoel – Sociological Methods & Research, 2024
Research on mixed devices in web surveys is in its infancy. Using a randomized experiment, we investigated device effects (desktop PC, tablet and mobile phone) for six response formats and four different numbers of scale points. N = 5,077 members of an online access panel participated in the experiment. An exact test of measurement invariance and…
Descriptors: Online Surveys, Handheld Devices, Telecommunications, Test Reliability

Peer reviewed
Direct link
