ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	19
Since 2017 (last 10 years)	43
Since 2007 (last 20 years)	73

Descriptor

Bayesian Statistics	102
Test Items	102
Item Response Theory	56
Models	26
Computation	25
Comparative Analysis	24
Adaptive Testing	22
Computer Assisted Testing	22
Item Analysis	22
Maximum Likelihood Statistics	22
Monte Carlo Methods	20
Simulation	20
Accuracy	18
Difficulty Level	18
Statistical Analysis	18
Test Construction	17
Foreign Countries	16
Test Bias	15
Achievement Tests	14
Goodness of Fit	14
Markov Processes	13
Latent Trait Theory	12
Mathematics Tests	12
Scores	12
Evaluation Methods	10
More ▼

Publication Type

Reports - Research	102
Journal Articles	81
Speeches/Meeting Papers	4
Numerical/Quantitative Data	2
Information Analyses	1

Education Level

Higher Education	13
Secondary Education	10
Postsecondary Education	8
Elementary Education	4
Grade 8	4
Grade 4	3
Intermediate Grades	3
Junior High Schools	3
Middle Schools	3
Early Childhood Education	2
Elementary Secondary Education	2
High Schools	2
Preschool Education	2
Grade 12	1
Grade 5	1
Grade 9	1
Kindergarten	1
Primary Education	1
More ▼

Audience

Practitioners

Location

Taiwan	3
Canada	2
Germany	2
Nigeria	2
Saudi Arabia	2
Africa	1
Botswana	1
Chile	1
Georgia Republic	1
Germany (Berlin)	1
Ghana	1
Malaysia	1
North Carolina (Charlotte)	1
Norway	1
Philippines	1
Poland	1
Russia	1
Singapore	1
South Africa	1
Switzerland	1
Thailand	1
Turkey	1
United States	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	5
Comprehensive Tests of Basic…	2
Graduate Record Examinations	2
National Assessment of…	2
Trends in International…	2
ACT Assessment	1
California Achievement Tests	1
Law School Admission Test	1
MacArthur Communicative…	1
Michigan Test of English…	1
Progress in International…	1
School and College Ability…	1
Wechsler Adult Intelligence…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 102 results Save | Export

Item Parameter Estimation of the 2PL IRT Model with Fixed Ability Estimates: Choices of Ability Estimation Methods and Priors on Slopes

Peer reviewed
PDF on ERIC

Download full text

Jianbin Fu; TsungHan Ho; Xuan Tan – Practical Assessment, Research & Evaluation, 2025

Item parameter estimation using an item response theory (IRT) model with fixed ability estimates is useful in equating with small samples on anchor items. The current study explores the impact of three ability estimation methods (weighted likelihood estimation [WLE], maximum a posteriori [MAP], and posterior ability distribution estimation [PST])…

Descriptors: Item Response Theory, Test Items, Computation, Equated Scores

Exploration of Latent Structure in Test Revision and Review Log Data

Peer reviewed

Direct link

Zhang, Susu; Li, Anqi; Wang, Shiyu – Educational Measurement: Issues and Practice, 2023

In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and…

Descriptors: Computer Assisted Testing, Test Construction, Test Wiseness, Test Items

Extending an Identified Four-Parameter IRT Model: The Confirmatory Set-4PNO Model

Peer reviewed

Direct link

Justin L. Kern – Journal of Educational and Behavioral Statistics, 2024

Given the frequent presence of slipping and guessing in item responses, models for the inclusion of their effects are highly important. Unfortunately, the most common model for their inclusion, the four-parameter item response theory model, potentially has severe deficiencies related to its possible unidentifiability. With this issue in mind, the…

Descriptors: Item Response Theory, Models, Bayesian Statistics, Generalization

Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?

Peer reviewed

Direct link

Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025

To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…

Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory

Bayesian Logistic Regression: A New Method to Calibrate Pretest Items in Multistage Adaptive Testing

Peer reviewed

Direct link

TsungHan Ho – Applied Measurement in Education, 2023

An operational multistage adaptive test (MST) requires the development of a large item bank and the effort to continuously replenish the item bank due to concerns about test security and validity over the long term. New items should be pretested and linked to the item bank before being used operationally. The linking item volume fluctuations in…

Descriptors: Bayesian Statistics, Regression (Statistics), Test Items, Pretesting

Detecting Preknowledge Cheating via Innovative Measures: A Mixture Hierarchical Model for Jointly Modeling Item Responses, Response Times, and Visual Fixation Counts

Peer reviewed

Direct link

Man, Kaiwen; Harring, Jeffrey R. – Educational and Psychological Measurement, 2023

Preknowledge cheating jeopardizes the validity of inferences based on test results. Many methods have been developed to detect preknowledge cheating by jointly analyzing item responses and response times. Gaze fixations, an essential eye-tracker measure, can be utilized to help detect aberrant testing behavior with improved accuracy beyond using…

Descriptors: Cheating, Reaction Time, Test Items, Responses

On Bank Assembly and Block Selection in Multidimensional Forced-Choice Adaptive Assessments

Peer reviewed

Direct link

Kreitchmann, Rodrigo S.; Sorrel, Miguel A.; Abad, Francisco J. – Educational and Psychological Measurement, 2023

Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of…

Descriptors: Measurement Techniques, Questionnaires, Social Desirability, Adaptive Testing

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Learning to Love LLMs for Answer Interpretation: Chain-of-Thought Prompting and the AMMORE Dataset

Peer reviewed
PDF on ERIC

Download full text

Owen Henkel; Hannah Horne-Robinson; Maria Dyshel; Greg Thompson; Ralph Abboud; Nabil Al Nahin Ch; Baptiste Moreau-Pernet; Kirk Vanacore – Journal of Learning Analytics, 2025

This paper introduces AMMORE, a new dataset of 53,000 math open-response question-answer pairs from Rori, a mathematics learning platform used by middle and high school students in several African countries. Using this dataset, we conducted two experiments to evaluate the use of large language models (LLM) for grading particularly challenging…

Descriptors: Learning Analytics, Learning Management Systems, Mathematics Instruction, Middle School Students

Using Machine Learning to Predict Bloom's Taxonomy Level for Certification Exam Items

Peer reviewed

Direct link

Mead, Alan D.; Zhou, Chenxuan – Journal of Applied Testing Technology, 2022

This study fit a Naïve Bayesian classifier to the words of exam items to predict the Bloom's taxonomy level of the items. We addressed five research questions, showing that reasonably good prediction of Bloom's level was possible, but accuracy varies across levels. In our study, performance for Level 2 was poor (Level 2 items were misclassified…

Descriptors: Artificial Intelligence, Prediction, Taxonomy, Natural Language Processing

An Evaluation of Fit Indices Used in Model Selection of Dichotomous Mixture IRT Models

Peer reviewed

Direct link

Sedat Sen; Allan S. Cohen – Educational and Psychological Measurement, 2024

A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's…

Descriptors: Goodness of Fit, Item Response Theory, Sample Size, Classification

Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test

Peer reviewed

Direct link

Lozano, José H.; Revuelta, Javier – Applied Measurement in Education, 2021

The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework…

Descriptors: Bayesian Statistics, Computation, Learning, Testing

Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests

Peer reviewed

Direct link

Huang, Hung-Yu – Educational and Psychological Measurement, 2023

The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs)…

Descriptors: Test Items, Classification, Bayesian Statistics, Decision Making

A Bayesian General Model to Account for Individual Differences in Operation-Specific Learning within a Test

Peer reviewed

Direct link

Lozano, José H.; Revuelta, Javier – Educational and Psychological Measurement, 2023

The present paper introduces a general multidimensional model to measure individual differences in learning within a single administration of a test. Learning is assumed to result from practicing the operations involved in solving the items. The model accounts for the possibility that the ability to learn may manifest differently for correct and…

Descriptors: Bayesian Statistics, Learning Processes, Test Items, Item Analysis

Estimating Classification Decisions for Incomplete Tests

Peer reviewed

Direct link

Feinberg, Richard A. – Educational Measurement: Issues and Practice, 2021

Unforeseen complications during the administration of large-scale testing programs are inevitable and can prevent examinees from accessing all test material. For classification tests in which the primary purpose is to yield a decision, such as a pass/fail result, the current study investigated a model-based standard error approach, Bayesian…

Descriptors: High Stakes Tests, Classification, Decision Making, Bayesian Statistics

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7

Educational and Psychological…	21
Applied Psychological…	8
Journal of Educational and…	8
Applied Measurement in…	7
Journal of Educational…	7
ETS Research Report Series	4
Grantee Submission	4
Psychometrika	4
Educational Measurement:…	3
Assessment & Evaluation in…	2
Computers & Education	1
EURASIA Journal of…	1
Early Education and…	1
Education and Information…	1
Educational Research and…	1
Educational Technology &…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Journal of Applied Testing…	1
Journal of Counseling…	1
Journal of Education…	1
Journal of Learning Analytics	1
Journal of Speech, Language,…	1
Measurement:…	1
More ▼

Mislevy, Robert J.	4
Chang, Hua-Hua	3
Huang, Hung-Yu	3
Tao, Jian	3
Wang, Chun	3
Weiss, David J.	3
Zwick, Rebecca	3
Chun Wang	2
Dodd, Barbara G.	2
Fox, Jean-Paul	2
Glas, Cees A. W.	2
Harring, Jeffrey R.	2
He, Wei	2
Isham, Steven	2
Jing Lu	2
Lozano, José H.	2
Man, Kaiwen	2
Reckase, Mark D.	2
Revuelta, Javier	2
Sinharay, Sandip	2
TsungHan Ho	2
Ye, Lei	2
van der Linden, Wim J.	2
Abad, Francisco J.	1
More ▼