ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	8
Since 2017 (last 10 years)	20
Since 2007 (last 20 years)	39

Descriptor

Difficulty Level	54
Sample Size	54
Test Items	54
Item Response Theory	30
Simulation	19
Comparative Analysis	16
Test Length	13
Computation	12
Correlation	11
Equated Scores	11
Error of Measurement	10
Statistical Analysis	10
Test Format	10
Item Analysis	8
Sampling	8
Scores	8
Accuracy	7
Estimation (Mathematics)	7
Guessing (Tests)	7
Monte Carlo Methods	7
Test Bias	7
Models	6
Ability	5
Factor Analysis	5
Multiple Choice Tests	5
More ▼

Publication Type

Reports - Research	37
Journal Articles	30
Speeches/Meeting Papers	11
Dissertations/Theses -…	8
Reports - Evaluative	8
Numerical/Quantitative Data	2

Education Level

Secondary Education	2
Elementary Secondary Education	1
Grade 8	1
Junior High Schools	1
Middle Schools	1

Audience

Researchers

Location

Australia	1
New York	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Comprehensive Tests of Basic…	1
Metropolitan Achievement Tests	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 54 results Save | Export

Item Parameter Recovery via Traditional 2PL, Testlet and Bi-Factor Models for Testlet-Based Tests

Peer reviewed
PDF on ERIC

Download full text

Soysal, Sumeyra; Yilmaz Kogar, Esin – International Journal of Assessment Tools in Education, 2022

The testlet comprises a set of items based on a common stimulus. When the testlet is used in the tests, there may violate the local independence assumption, and in this case, it would not be appropriate to use traditional item response theory models in the tests in which the testlet is included. When the testlet is discussed, one of the most…

Descriptors: Test Items, Test Theory, Models, Sample Size

What Affects the Quality of Score Transformations? Potential Issues in True-Score Equating Using the Partial Credit Model

Peer reviewed

Direct link

Fellinghauer, Carolina; Debelak, Rudolf; Strobl, Carolin – Educational and Psychological Measurement, 2023

This simulation study investigated to what extent departures from construct similarity as well as differences in the difficulty and targeting of scales impact the score transformation when scales are equated by means of concurrent calibration using the partial credit model with a common person design. Practical implications of the simulation…

Descriptors: True Scores, Equated Scores, Test Items, Sample Size

Effect of Missing Data on Test Equating Methods Under NEAT Design

Peer reviewed
PDF on ERIC

Download full text

Semih Asiret; Seçil Ömür Sünbül – International Journal of Psychology and Educational Studies, 2023

In this study, it was aimed to examine the effect of missing data in different patterns and sizes on test equating methods under the NEAT design for different factors. For this purpose, as part of this study, factors such as sample size, average difficulty level difference between the test forms, difference between the ability distribution,…

Descriptors: Research Problems, Data, Test Items, Equated Scores

Sample Size and Item Parameter Estimation Precision When Utilizing the Masters' Partial Credit Model

Download full text

Custer, Michael; Kim, Jongpil – Online Submission, 2023

This study utilizes an analysis of diminishing returns to examine the relationship between sample size and item parameter estimation precision when utilizing the Masters' Partial Credit Model for polytomous items. Item data from the standardization of the Batelle Developmental Inventory, 3rd Edition were used. Each item was scored with a…

Descriptors: Sample Size, Item Response Theory, Test Items, Computation

IRT Models for Learning with Item-Specific Learning Parameters

Peer reviewed

Direct link

Yu, Albert; Douglas, Jeffrey A. – Journal of Educational and Behavioral Statistics, 2023

We propose a new item response theory growth model with item-specific learning parameters, or ISLP, and two variations of this model. In the ISLP model, either items or blocks of items have their own learning parameters. This model may be used to improve the efficiency of learning in a formative assessment. We show ways that the ISLP model's…

Descriptors: Item Response Theory, Learning, Markov Processes, Monte Carlo Methods

Detection of Outliers in Anchor Items Using Modified Rasch Fit Statistics

Peer reviewed

Direct link

Liu, Chunyan; Jurich, Daniel; Morrison, Carol; Grabovsky, Irina – Applied Measurement in Education, 2021

The existence of outliers in the anchor items can be detrimental to the estimation of examinee ability and undermine the validity of score interpretation across forms. However, in practice, anchor item performance can become distorted due to various reasons. This study compares the performance of modified "INFIT" and "OUTFIT"…

Descriptors: Equated Scores, Test Items, Item Response Theory, Difficulty Level

Can Auxiliary Information Improve Rasch Estimation at Small Sample Sizes?

Direct link

Derek Sauder – ProQuest LLC, 2020

The Rasch model is commonly used to calibrate multiple choice items. However, the sample sizes needed to estimate the Rasch model can be difficult to attain (e.g., consider a small testing company trying to pretest new items). With small sample sizes, auxiliary information besides the item responses may improve estimation of the item parameters.…

Descriptors: Item Response Theory, Sample Size, Computation, Test Length

Closed Formula of Test Length Required for Adaptive Testing with Medium Probability of Solution

Peer reviewed

Direct link

Kárász, Judit T.; Széll, Krisztián; Takács, Szabolcs – Quality Assurance in Education: An International Perspective, 2023

Purpose: Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in…

Descriptors: Test Length, Probability, Comparative Analysis, Difficulty Level

Investigation of the Effect of Parameter Estimation and Classification Accuracy in Mixture IRT Models under Different Conditions

Peer reviewed
PDF on ERIC

Download full text

Saatcioglu, Fatima Munevver; Atar, Hakan Yavuz – International Journal of Assessment Tools in Education, 2022

This study aims to examine the effects of mixture item response theory (IRT) models on item parameter estimation and classification accuracy under different conditions. The manipulated variables of the simulation study are set as mixture IRT models (Rasch, 2PL, 3PL); sample size (600, 1000); the number of items (10, 30); the number of latent…

Descriptors: Accuracy, Classification, Item Response Theory, Programming Languages

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Improvement of Norm Score Quality via Regression-Based Continuous Norming

Peer reviewed

Direct link

Lenhard, Wolfgang; Lenhard, Alexandra – Educational and Psychological Measurement, 2021

The interpretation of psychometric test results is usually based on norm scores. We compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for test scales with different item numbers and difficulties via an item response theory approach. Subsequently, we modeled the norm scores based on random…

Descriptors: Test Norms, Scores, Regression (Statistics), Test Items

Comparing Small-Sample Equating with Angoff Judgement for Linking Cut-Scores on Two Tests

Download full text

Bramley, Tom – Research Matters, 2020

The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…

Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy

Differential Item Functioning Effect Size from the Multigroup Confirmatory Factor Analysis for a Meta-Analysis: A Simulation Study

Peer reviewed

Direct link

Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021

This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…

Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

Exploration of Factors Affecting the Added Value of Test Subscores

Peer reviewed

Direct link

Wang, Xiaolin; Svetina, Dubravka; Dai, Shenghai – Journal of Experimental Education, 2019

Recently, interest in test subscore reporting for diagnosis purposes has been growing rapidly. The two simulation studies here examined factors (sample size, number of subscales, correlation between subscales, and three factors affecting subscore reliability: number of items per subscale, item parameter distribution, and data generating model)…

Descriptors: Value Added Models, Scores, Sample Size, Correlation

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

ProQuest LLC	8
Educational and Psychological…	6
International Journal of…	4
Applied Measurement in…	3
ETS Research Report Series	2
Online Submission	2
American Journal of…	1
Applied Psychological…	1
Computers & Education	1
Hacettepe University Journal…	1
International Journal of…	1
International Journal of…	1
Journal of Educational…	1
Journal of Educational…	1
Journal of Educational and…	1
Journal of Experimental…	1
Journal of Psychoeducational…	1
Pearson	1
Practical Assessment,…	1
Quality Assurance in…	1
Research Matters	1
More ▼

Ahn, Soyeon	2
Arikan, Çigdem Akin	2
Ito, Kyoko	2
Sykes, Robert C.	2
Abulela, Mohammed A. A.	1
Atar, Burcu	1
Atar, Hakan Yavuz	1
Bacon, Tina P.	1
Bramley, Tom	1
Cai, Li	1
Carlson, Alfred B.	1
Carvajal-Espinoza, Jorge E.	1
Custer, Michael	1
Dai, Shenghai	1
DeMars, Christine E.	1
Debelak, Rudolf	1
Derek Sauder	1
Desmet, Piet	1
Dorans, Neil J.	1
Douglas, Jeffrey A.	1
Dungan, Lucille A.	1
Eckerly, Carol	1
Farish, Stephen J.	1
Fellinghauer, Carolina	1
More ▼