ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	14
Since 2017 (last 10 years)	28
Since 2007 (last 20 years)	74

Descriptor

Computation	80
Item Response Theory	34
Simulation	27
Models	24
Test Items	23
Accuracy	19
Scores	15
Statistical Analysis	14
Comparative Analysis	13
Equated Scores	12
Evaluation Methods	10
Classification	9
Difficulty Level	9
Error of Measurement	9
Monte Carlo Methods	9
Bayesian Statistics	8
Correlation	8
Measurement Techniques	8
Sample Size	8
Adaptive Testing	7
Maximum Likelihood Statistics	7
Measurement	7
Psychometrics	6
Reliability	6
Statistical Distributions	6
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	77
Reports - Research	47
Reports - Evaluative	20
Reports - Descriptive	9
Opinion Papers	1

Education Level

Higher Education	3
Postsecondary Education	3
Grade 4	2
Secondary Education	2
Elementary Education	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1

Audience

Researchers

Location

China	1
Colombia	1
Germany	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
Law School Admission Test	1
Program for International…	1
Progress in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 80 results Save | Export

Optimal Calibration of Items for Multidimensional Achievement Tests

Peer reviewed

Direct link

Mahmood Ul Hassan; Frank Miller – Journal of Educational Measurement, 2024

Multidimensional achievement tests are recently gaining more importance in educational and psychological measurements. For example, multidimensional diagnostic tests can help students to determine which particular domain of knowledge they need to improve for better performance. To estimate the characteristics of candidate items (calibration) for…

Descriptors: Multidimensional Scaling, Achievement Tests, Test Items, Test Construction

A Generalized Objective Function for Computer Adaptive Item Selection

Peer reviewed

Direct link

Harold Doran; Testsuhiro Yamada; Ted Diaz; Emre Gonulates; Vanessa Culver – Journal of Educational Measurement, 2025

Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Algorithms

Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches

Peer reviewed

Direct link

Güler Yavuz Temel – Journal of Educational Measurement, 2024

The purpose of this study was to investigate multidimensional DIF with a simple and nonsimple structure in the context of multidimensional Graded Response Model (MGRM). This study examined and compared the performance of the IRT-LR and Wald test using MML-EM and MHRM estimation approaches with different test factors and test structures in…

Descriptors: Computation, Multidimensional Scaling, Item Response Theory, Models

Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model

Peer reviewed

Direct link

Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023

Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…

Descriptors: Testing, Computation, Classification, Accuracy

Several Variations of Simple-Structure MIRT Equating

Peer reviewed

Direct link

Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2023

The current study proposed several variants of simple-structure multidimensional item response theory equating procedures. Four distinct sets of data were used to demonstrate feasibility of proposed equating methods for two different equating designs: a random groups design and a common-item nonequivalent groups design. Findings indicated some…

Descriptors: Item Response Theory, Equated Scores, Monte Carlo Methods, Research Methodology

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

Pretest Item Calibration in Computerized Multistage Adaptive Testing

Peer reviewed

Direct link

Ersen, Rabia Karatoprak; Lee, Won-Chan – Journal of Educational Measurement, 2023

The purpose of this study was to compare calibration and linking methods for placing pretest item parameter estimates on the item pool scale in a 1-3 computerized multistage adaptive testing design in terms of item parameter recovery. Two models were used: embedded-section, in which pretest items were administered within a separate module, and…

Descriptors: Pretesting, Test Items, Computer Assisted Testing, Adaptive Testing

Online Calibration in Multidimensional Computerized Adaptive Testing with Polytomously Scored Items

Peer reviewed

Direct link

Yuan, Lu; Huang, Yingshi; Li, Shuhang; Chen, Ping – Journal of Educational Measurement, 2023

Online calibration is a key technology for item calibration in computerized adaptive testing (CAT) and has been widely used in various forms of CAT, including unidimensional CAT, multidimensional CAT (MCAT), CAT with polytomously scored items, and cognitive diagnostic CAT. However, as multidimensional and polytomous assessment data become more…

Descriptors: Computer Assisted Testing, Adaptive Testing, Computation, Test Items

Computation and Accuracy Evaluation of Comparable Scores on Culturally Responsive Assessments

Peer reviewed

Direct link

Sandip Sinharay; Matthew S. Johnson – Journal of Educational Measurement, 2024

Culturally responsive assessments have been proposed as potential tools to ensure equity and fairness for examinees from all backgrounds including those from traditionally underserved or minoritized groups. However, these assessments are relatively new and, with few exceptions, are yet to be implemented in large scale. Consequently, there is a…

Descriptors: Culturally Relevant Education, Evaluation, Equal Education, Disadvantaged

A Novel Partial Credit Extension Using Varying Thresholds to Account for Response Tendencies

Peer reviewed

Direct link

Henninger, Mirka – Journal of Educational Measurement, 2021

Item Response Theory models with varying thresholds are essential tools to account for unknown types of response tendencies in rating data. However, in order to separate constructs to be measured and response tendencies, specific constraints have to be imposed on varying thresholds and their interrelations. In this article, a multidimensional…

Descriptors: Response Style (Tests), Item Response Theory, Models, Computation

Robust Estimation for Response Time Modeling

Peer reviewed

Direct link

Hong, Maxwell; Rebouças, Daniella A.; Cheng, Ying – Journal of Educational Measurement, 2021

Response time has started to play an increasingly important role in educational and psychological testing, which prompts many response time models to be proposed in recent years. However, response time modeling can be adversely impacted by aberrant response behavior. For example, test speededness can cause response time to certain items to deviate…

Descriptors: Reaction Time, Models, Computation, Robustness (Statistics)

An Exploration of an Improved Aggregate Student Growth Measure Using Data from Two States

Peer reviewed

Direct link

Castellano, Katherine E.; McCaffrey, Daniel F.; Lockwood, J. R. – Journal of Educational Measurement, 2023

The simple average of student growth scores is often used in accountability systems, but it can be problematic for decision making. When computed using a small/moderate number of students, it can be sensitive to the sample, resulting in inaccurate representations of growth of the students, low year-to-year stability, and inequities for…

Descriptors: Academic Achievement, Accountability, Decision Making, Computation

Likelihood-Based Estimation of Model-Derived Oral Reading Fluency

Peer reviewed

Direct link

Cornelis Potgieter; Xin Qiao; Akihito Kamata; Yusuf Kara – Journal of Educational Measurement, 2024

As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores, including…

Descriptors: Oral Reading, Reading Fluency, Scores, Psychometrics

Bayesian Extension of Biweight and Huber Weight for Robust Ability Estimation

Peer reviewed

Direct link

Maeda, Hotaka; Zhang, Bo – Journal of Educational Measurement, 2020

When a response pattern does not fit a selected measurement model, one may resort to robust ability estimation. Two popular robust methods are biweight and Huber weight. So far, research on these methods has been quite limited. This article proposes the maximum a posteriori biweight (BMAP) and Huber weight (HMAP) estimation methods. These methods…

Descriptors: Bayesian Statistics, Robustness (Statistics), Computation, Monte Carlo Methods

A Computationally Simple Method for Estimating Decision Consistency

Peer reviewed

Direct link

Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021

Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…

Descriptors: Decision Making, Reliability, Classification, Scores

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Wang, Wen-Chung	7
Lee, Won-Chan	4
Cheng, Ying	3
Kim, Seonghoon	3
Moses, Tim	3
Castellano, Katherine E.	2
Chen, Ping	2
Holland, Paul W.	2
Jiao, Hong	2
Jin, Kuan-Yu	2
McCaffrey, Daniel F.	2
Rutkowski, Leslie	2
Wang, Shudong	2
Wind, Stefanie A.	2
de la Torre, Jimmy	2
von Davier, Alina A.	2
von Davier, Matthias	2
Akihito Kamata	1
Albano, Anthony D.	1
Amanda Goodwin	1
Attali, Yigal	1
Baldwin, Peter	1
Brewer, James K.	1
Briggs, Derek C.	1
Cai, Li	1
More ▼