ERIC - Search Results

Publication Date

In 2026	0
Since 2025	8
Since 2022 (last 5 years)	31
Since 2017 (last 10 years)	79
Since 2007 (last 20 years)	136

Descriptor

Test Items	226
Test Length	226
Item Response Theory	90
Sample Size	66
Test Construction	66
Computer Assisted Testing	54
Adaptive Testing	52
Simulation	51
Test Reliability	44
Error of Measurement	41
Comparative Analysis	40
Difficulty Level	40
Accuracy	38
Item Analysis	37
Test Format	37
Test Validity	32
Correlation	30
Computation	29
Statistical Analysis	29
Test Bias	29
Monte Carlo Methods	28
Models	27
Scores	27
Item Banks	26
Goodness of Fit	23
More ▼

Publication Type

Reports - Research	155
Journal Articles	138
Reports - Evaluative	41
Speeches/Meeting Papers	32
Dissertations/Theses -…	19
Reports - Descriptive	7
Numerical/Quantitative Data	6
Guides - Non-Classroom	4
Tests/Questionnaires	3
Information Analyses	2
Opinion Papers	2
Historical Materials	1
Reference Materials -…	1
More ▼

Education Level

Higher Education	14
Postsecondary Education	13
Secondary Education	8
Elementary Education	6
Elementary Secondary Education	6
High Schools	4
Early Childhood Education	3
Grade 3	3
Middle Schools	3
Grade 6	2
Intermediate Grades	2
Primary Education	2
Grade 11	1
Grade 12	1
Junior High Schools	1
Preschool Education	1
More ▼

Audience

Researchers	9
Administrators	1
Community	1
Practitioners	1

Location

Turkey	2
Alabama	1
Asia	1
Australia	1
Germany	1
Illinois (Chicago)	1
Indiana	1
Iran	1
Israel	1
Japan	1
Netherlands	1
New Jersey	1
Peru	1
South Korea	1
Taiwan	1
Ukraine	1
More ▼

Laws, Policies, & Programs

Job Training Partnership Act…	1
Race to the Top	1

What Works Clearinghouse Rating

Test Items X

Showing 106 to 120 of 226 results Save | Export

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

Bi-Factor Multidimensional Item Response Theory Modeling for Subscores Estimation, Reliability, and Classification

Direct link

Md Desa, Zairul Nor Deana – ProQuest LLC, 2012

In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…

Descriptors: Item Response Theory, Computation, Reliability, Classification

Treatment of Not-Administered Items on Individually Administered Intelligence Tests

Peer reviewed

Direct link

He, Wei; Wolfe, Edward W. – Educational and Psychological Measurement, 2012

In administration of individually administered intelligence tests, items are commonly presented in a sequence of increasing difficulty, and test administration is terminated after a predetermined number of incorrect answers. This practice produces stochastically censored data, a form of nonignorable missing data. By manipulating four factors…

Descriptors: Individual Testing, Intelligence Tests, Test Items, Test Length

A Comparison of Three Content Balancing Methods for Fixed and Variable Length Computerized Adaptive Tests

Direct link

Shin, Chingwei David; Chien, Yuehmei; Way, Walter Denny – Pearson, 2012

Content balancing is one of the most important components in the computerized adaptive testing (CAT) especially in the K to 12 large scale tests that complex constraint structure is required to cover a broad spectrum of content. The purpose of this study is to compare the weighted penalty model (WPM) and the weighted deviation method (WDM) under…

Descriptors: Computer Assisted Testing, Elementary Secondary Education, Test Content, Models

Polytomous Adaptive Classification Testing: Effects of Item Pool Size, Test Termination Criterion, and Number of Cutscores

Peer reviewed

Direct link

Gnambs, Timo; Batinic, Bernad – Educational and Psychological Measurement, 2011

Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…

Descriptors: Test Length, Computer Assisted Testing, Classification, Test Items

Formulating the Rasch Differential Item Functioning Model under the Marginal Maximum Likelihood Estimation Context and Its Comparison with Mantel-Haenszel Procedure in Short Test and Small Sample Conditions

Peer reviewed

Direct link

Paek, Insu; Wilson, Mark – Educational and Psychological Measurement, 2011

This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…

Descriptors: Test Bias, Test Length, Statistical Inference, Geometric Concepts

Conditions Affecting the Accuracy of Classical Equating Methods for Small Samples under the NEAT Design: A Simulation Study

Direct link

Sunnassee, Devdass – ProQuest LLC, 2011

Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under various test characteristic and sampling conditions. The equating methods considered are typically…

Descriptors: Test Length, Test Format, Sample Size, Simulation

Computerized Adaptive Testing with the Zinnes and Griggs Pairwise Preference Ideal Point Model

Peer reviewed

Direct link

Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011

This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…

Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis

Computerized Classification Testing under the Generalized Graded Unfolding Model

Peer reviewed

Direct link

Wang, Wen-Chung; Liu, Chen-Wei – Educational and Psychological Measurement, 2011

The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree-disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut…

Descriptors: Computer Assisted Testing, Adaptive Testing, Classification, Item Response Theory

Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

Peer reviewed
PDF on ERIC

Download full text

Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2010

This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

Descriptors: Test Bias, Item Response Theory, Test Items, Scores

Comparing Accuracy of Parameter Estimation Using IRT Models in the Presence of Guessing

Direct link

Fu, Qiong – ProQuest LLC, 2010

This research investigated how the accuracy of person ability and item difficulty parameter estimation varied across five IRT models with respect to the presence of guessing, targeting, and varied combinations of sample sizes and test lengths. The data were simulated with 50 replications under each of the 18 combined conditions. Five IRT models…

Descriptors: Item Response Theory, Guessing (Tests), Accuracy, Computation

Controlling Test Overlap Rate in Automated Assembly of Multiple Equivalent Test Forms

Peer reviewed
PDF on ERIC

Download full text

Lin, Chuan-Ju – Journal of Technology, Learning, and Assessment, 2010

Assembling equivalent test forms with minimal test overlap across forms is important in ensuring test security. Chen and Lei (2009) suggested a exposure control technique to control test overlap-ordered item pooling on the fly based on the essence that test overlap rate--ordered item pooling for the first t examinees is a function of test overlap…

Descriptors: Test Length, Test Format, Evaluation Criteria, Psychometrics

Ongoing Issues in Test Fairness

Peer reviewed

Direct link

Camilli, Gregory – Educational Research and Evaluation, 2013

In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…

Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

Evaluating IRT- and CTT-Based Methods of Estimating Classification Consistency and Accuracy Indices from Single Administrations

Direct link

Deng, Nina – ProQuest LLC, 2011

Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…

Descriptors: Item Response Theory, Test Theory, Computation, Classification

A Range-Null Hypothesis Approach for Testing DIF under the Rasch Model

Peer reviewed

Direct link

Wells, Craig S.; Cohen, Allan S.; Patton, Jeffrey – International Journal of Testing, 2009

A primary concern with testing differential item functioning (DIF) using a traditional point-null hypothesis is that a statistically significant result does not imply that the magnitude of DIF is of practical interest. Similarly, for a given sample size, a non-significant result does not allow the researcher to conclude the item is free of DIF. To…

Descriptors: Test Bias, Test Items, Statistical Analysis, Hypothesis Testing

« Previous Page | Next Page »

Pages: 1 | ... | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ... | 16

Educational and Psychological…	33
ProQuest LLC	19
Journal of Educational…	16
Applied Measurement in…	9
Applied Psychological…	9
ETS Research Report Series	9
International Journal of…	7
International Journal of…	7
Journal of Educational and…	5
Measurement:…	4
Journal of Psychoeducational…	3
Assessment & Evaluation in…	2
Education and Information…	2
Educational Sciences: Theory…	2
Eurasian Journal of…	2
Grantee Submission	2
Journal of Experimental…	2
Journal of Technology,…	2
Physical Review Physics…	2
ACT Education Corp.	1
AERA Online Paper Repository	1
Advanced Education	1
Anatomical Sciences Education	1
Asia Pacific Education Review	1
Assessment and Evaluation in…	1
More ▼

Wainer, Howard	6
Hambleton, Ronald K.	4
Wang, Wen-Chung	4
Berk, Ronald A.	3
Burton, Richard F.	3
Cohen, Allan S.	3
Huggins-Manley, Anne Corinne	3
Lee, Won-Chan	3
Lee, Yi-Hsuan	3
Pommerich, Mary	3
Reckase, Mark D.	3
Sijtsma, Klaas	3
Wang, Chun	3
Weiss, David J.	3
Zhang, Jinming	3
Bradshaw, Laine	2
Bulut, Okan	2
Chen, Shu-Ying	2
Cheng, Ying	2
Chernyshenko, Oleksandr S.	2
Cui, Ying	2
De Ayala, R. J.	2
Diao, Qi	2
Dogan, Nuri	2
More ▼

Program for International…	4
Test of English as a Foreign…	3
Trends in International…	3
SAT (College Admission Test)	2
ACT Assessment	1
Advanced Placement…	1
Armed Forces Qualification…	1
COMPASS (Computer Assisted…	1
Comprehensive Tests of Basic…	1
Force Concept Inventory	1
Iowa Tests of Basic Skills	1
MacArthur Communicative…	1
Medical College Admission Test	1
National Longitudinal Study…	1
New Jersey College Basic…	1
Otis Lennon School Ability…	1
Raven Advanced Progressive…	1
School and College Ability…	1
Stanford Binet Intelligence…	1
Texas Assessment of Basic…	1
Texas Educational Assessment…	1
Wechsler Intelligence Scale…	1
Wechsler Intelligence Scales…	1
More ▼