ERIC - Search Results

Publication Date

In 2026	0
Since 2025	52
Since 2022 (last 5 years)	410
Since 2017 (last 10 years)	913
Since 2007 (last 20 years)	1964

Descriptor

Error of Measurement	3310
Statistical Analysis	600
Scores	510
Item Response Theory	449
Correlation	434
Comparative Analysis	424
Foreign Countries	418
Test Reliability	412
Computation	406
Simulation	370
Reliability	357
Sample Size	354
Models	353
Evaluation Methods	350
Test Items	349
Measurement Techniques	318
Factor Analysis	311
Sampling	300
Statistical Bias	300
Research Methodology	288
Goodness of Fit	260
Psychometrics	259
Monte Carlo Methods	258
Regression (Statistics)	246
Mathematical Models	241
More ▼

Author

Raykov, Tenko	23
Brennan, Robert L.	19
Kolen, Michael J.	19
Lord, Frederic M.	17
Thompson, Bruce	16
Zimmerman, Donald W.	16
Lee, Won-Chan	15
Livingston, Samuel A.	14
McCaffrey, Daniel F.	14
Yuan, Ke-Hai	14
van der Linden, Wim J.	14
Cai, Li	13
Moses, Tim	13
Beretvas, S. Natasha	12
Marsh, Herbert W.	12
Zwick, Rebecca	12
Algina, James	11
Ferron, John M.	11
Lee, Guemin	11
Lockwood, J. R.	11
Marcoulides, George A.	11
Reardon, Sean F.	11
DeMars, Christine E.	10
Henson, Robin K.	10
More ▼

Education Level

Higher Education	271
Secondary Education	201
Postsecondary Education	197
Elementary Education	194
Elementary Secondary Education	126
Middle Schools	98
High Schools	82
Junior High Schools	78
Early Childhood Education	61
Grade 4	48
Intermediate Grades	44
Primary Education	42
Grade 8	40
Grade 3	39
Grade 5	39
Grade 7	33
Kindergarten	24
Adult Education	23
Grade 6	19
Grade 2	17
Preschool Education	16
Grade 1	15
Grade 10	12
Grade 9	12
Two Year Colleges	6
More ▼

Audience

Researchers	93
Practitioners	23
Teachers	22
Policymakers	10
Administrators	5
Students	4
Counselors	2
Parents	2
Community	1

Location

United States	47
Germany	42
Australia	34
Canada	27
Turkey	27
California	22
United Kingdom (England)	20
Netherlands	18
China	17
New York	15
United Kingdom	15
North Carolina	14
Texas	14
Italy	12
South Korea	12
Florida	11
Indonesia	11
New Zealand	11
Pennsylvania	11
Spain	11
Japan	10
Taiwan	10
Iran	9
Norway	9
Portugal	9
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001	11
Race to the Top	6
Elementary and Secondary…	4
Aid to Families with…	1
Elementary and Secondary…	1
Every Student Succeeds Act…	1
Family Educational Rights and…	1
Guaranteed Student Loan…	1
Head Start	1
Individuals with Disabilities…	1
Job Training Partnership Act…	1
Strengthening Career and…	1
More ▼

What Works Clearinghouse Rating

Does not meet standards

Showing 61 to 75 of 3,310 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights from a Novel Modeling Approach

Peer reviewed

Direct link

Hung-Yu Huang – Educational and Psychological Measurement, 2025

The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent…

Descriptors: Response Style (Tests), Psychological Characteristics, Item Response Theory, Test Reliability

The Sensitivity of Value-Added Estimates to Test Scoring Decisions. EdWorkingPaper No. 25-1226

Download full text

Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025

Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…

Descriptors: Value Added Models, Tests, Testing, Scoring

The Impact of Measurement Model Misspecification on Coefficient Omega Estimates of Composite Reliability

Peer reviewed

Direct link

Stephanie M. Bell; R. Philip Chalmers; David B. Flora – Educational and Psychological Measurement, 2024

Coefficient omega indices are model-based composite reliability estimates that have become increasingly popular. A coefficient omega index estimates how reliably an observed composite score measures a target construct as represented by a factor in a factor-analysis model; as such, the accuracy of omega estimates is likely to depend on correct…

Descriptors: Influences, Models, Measurement Techniques, Reliability

An Examination of the Linking Error Currently Used in PISA

Peer reviewed

Direct link

Alexander Robitzsch; Oliver Lüdtke – Measurement: Interdisciplinary Research and Perspectives, 2024

Educational large-scale assessment (LSA) studies like the program for international student assessment (PISA) provide important information about trends in the performance of educational indicators in cognitive domains. The change in the country means in a cognitive domain like reading between two successive assessments is an example of a trend…

Descriptors: Secondary School Students, Foreign Countries, International Assessment, Achievement Tests

The Analysis of Marking Reliability through the Approach of Gauge Repeatability and Reproducibility (GR&R) Study: A Case of English-Speaking Test

Peer reviewed

Direct link

Pornphan Sureeyatanapas; Panitas Sureeyatanapas; Uthumporn Panitanarak; Jittima Kraisriwattana; Patchanan Sarootyanapat; Daniel O'Connell – Language Testing in Asia, 2024

Ensuring consistent and reliable scoring is paramount in education, especially in performance-based assessments. This study delves into the critical issue of marking consistency, focusing on speaking proficiency tests in English language learning, which often face greater reliability challenges. While existing literature has explored various…

Descriptors: Foreign Countries, Students, English Language Learners, Speech

Evaluating Test-Retest Reliability and Measurement Error of the Lumbar Flexion Relaxation Ratio within and between Days

Peer reviewed

Direct link

Samuel J. Howarth; Erinn McCreath Frangakis; Steven Hirsch; Diana De Carvalho – Measurement in Physical Education and Exercise Science, 2024

The flexion relaxation ratio (FRR) of the lumbar extensor muscles is often assessed in experimental and clinical studies. This study evaluated within- and between-session test--retest reliability and measurement error for different FRR formulations. Participants completed two identical data collection sessions 1-week apart. Spine flexion and…

Descriptors: Exercise Physiology, Human Body, Pretests Posttests, Error of Measurement

Minimal-Effect Testing, Equivalence Testing, and the Conventional Null Hypothesis Testing for the Analysis of Bi-Factor Models

Peer reviewed

Direct link

Shunji Wang; Katerina M. Marcoulides; Jiashan Tang; Ke-Hai Yuan – Structural Equation Modeling: A Multidisciplinary Journal, 2024

A necessary step in applying bi-factor models is to evaluate the need for domain factors with a general factor in place. The conventional null hypothesis testing (NHT) was commonly used for such a purpose. However, the conventional NHT meets challenges when the domain loadings are weak or the sample size is insufficient. This article proposes…

Descriptors: Hypothesis Testing, Error of Measurement, Comparative Analysis, Monte Carlo Methods

Investigating Structural Model Fit Evaluation

Peer reviewed

Direct link

Xijuan Zhang; Hao Wu – Structural Equation Modeling: A Multidisciplinary Journal, 2024

A full structural equation model (SEM) typically consists of both a measurement model (describing relationships between latent variables and observed scale items) and a structural model (describing relationships among latent variables). However, often researchers are primarily interested in testing hypotheses related to the structural model while…

Descriptors: Structural Equation Models, Goodness of Fit, Robustness (Statistics), Factor Structure

Rotation Local Solutions in Multidimensional Item Response Theory Models

Peer reviewed

Direct link

Hoang V. Nguyen; Niels G. Waller – Educational and Psychological Measurement, 2024

We conducted an extensive Monte Carlo study of factor-rotation local solutions (LS) in multidimensional, two-parameter logistic (M2PL) item response models. In this study, we simulated more than 19,200 data sets that were drawn from 96 model conditions and performed more than 7.6 million rotations to examine the influence of (a) slope parameter…

Descriptors: Monte Carlo Methods, Item Response Theory, Correlation, Error of Measurement

Twenty Years of Network Meta-Analysis: Continuing Controversies and Recent Developments

Peer reviewed

Direct link

A. E. Ades; Nicky J. Welton; Sofia Dias; David M. Phillippo; Deborah M. Caldwell – Research Synthesis Methods, 2024

Network meta-analysis (NMA) is an extension of pairwise meta-analysis (PMA) which combines evidence from trials on multiple treatments in connected networks. NMA delivers internally consistent estimates of relative treatment efficacy, needed for rational decision making. Over its first 20 years NMA's use has grown exponentially, with applications…

Descriptors: Network Analysis, Meta Analysis, Medicine, Clinical Experience

The Influence of Procedural Characteristics on Within-Case Effect Sizes for Academic Outcomes

Peer reviewed

Direct link

Ethan R. Van Norman; David A. Klingbeil; Adelle K. Sturgell – Grantee Submission, 2024

Single-case experimental designs (SCEDs) have been used with increasing frequency to identify evidence-based interventions in education. The purpose of this study was to explore how several procedural characteristics, including within-phase variability (i.e., measurement error), number of baseline observations, and number of intervention…

Descriptors: Research Design, Case Studies, Effect Size, Error of Measurement

Analytic Approaches to Handle Missing Data in Simple Matrix Sampling Planned Missing Designs

Peer reviewed

Direct link

Ting Dai; Yang Du; Jennifer Cromley; Tia Fechter; Frank Nelson – Journal of Experimental Education, 2024

Simple matrix sampling planned missing (SMS PD) design, introduce missing data patterns that lead to covariances between variables that are not jointly observed, and create difficulties for analyses other than mean and variance estimations. Based on prior research, we adopted a new multigroup confirmatory factor analysis (CFA) approach to handle…

Descriptors: Research Problems, Research Design, Data, Matrices

Do Different Devices Perform Equally Well with Different Numbers of Scale Points and Response Formats? A Test of Measurement Invariance and Reliability

Peer reviewed

Direct link

Natalja Menold; Vera Toepoel – Sociological Methods & Research, 2024

Research on mixed devices in web surveys is in its infancy. Using a randomized experiment, we investigated device effects (desktop PC, tablet and mobile phone) for six response formats and four different numbers of scale points. N = 5,077 members of an online access panel participated in the experiment. An exact test of measurement invariance and…

Descriptors: Online Surveys, Handheld Devices, Telecommunications, Test Reliability

« Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 221

Educational and Psychological…	259
Journal of Educational…	115
ProQuest LLC	95
Applied Psychological…	85
Journal of Educational and…	85
Psychometrika	82
Structural Equation Modeling:…	76
Grantee Submission	70
Journal of Experimental…	70
ETS Research Report Series	59
Multivariate Behavioral…	54
Applied Measurement in…	50
Sociological Methods &…	47
Journal of Psychoeducational…	37
Psychological Methods	33
Society for Research on…	33
Educational Measurement:…	32
Research Synthesis Methods	32
Online Submission	29
Practical Assessment,…	27
International Journal of…	26
Journal of Educational…	26
National Center for Education…	25
Psychology in the Schools	25
Structural Equation Modeling	23
More ▼

Journal Articles	2358
Reports - Research	1904
Reports - Evaluative	703
Reports - Descriptive	344
Speeches/Meeting Papers	329
Dissertations/Theses -…	95
Numerical/Quantitative Data	86
Opinion Papers	77
Information Analyses	72
Tests/Questionnaires	47
Guides - Non-Classroom	27
Guides - Classroom - Teacher	12
Book/Product Reviews	10
Reports - General	9
ERIC Publications	8
ERIC Digests in Full Text	7
Guides - General	7
Books	6
Guides - Classroom - Learner	4
Collected Works - General	3
Legal/Legislative/Regulatory…	3
Historical Materials	2
Collected Works - Proceedings	1
Collected Works - Serial	1
Collected Works - Serials	1
More ▼

Program for International…	45
National Assessment of…	40
SAT (College Admission Test)	24
Trends in International…	24
ACT Assessment	20
Wechsler Intelligence Scale…	20
Early Childhood Longitudinal…	19
Wechsler Adult Intelligence…	12
Iowa Tests of Basic Skills	10
Schools and Staffing Survey…	10
Test of English as a Foreign…	9
Child Behavior Checklist	7
Graduate Record Examinations	7
National Longitudinal Survey…	7
Progress in International…	7
Beck Depression Inventory	6
Advanced Placement…	5
Armed Services Vocational…	5
Cognitive Abilities Test	5
Longitudinal Surveys of…	5
National Household Education…	5
Rosenberg Self Esteem Scale	5
Dynamic Indicators of Basic…	4
Law School Admission Test	4
Motivated Strategies for…	4
More ▼