ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	12

Descriptor

Comparative Analysis	17
Models	17
Test Format	17
Item Response Theory	9
Test Items	9
Simulation	7
Computer Assisted Testing	5
Accuracy	4
Equated Scores	4
Test Reliability	4
Adaptive Testing	3
Error of Measurement	3
Responses	3
Advanced Placement	2
Classification	2
Correlation	2
Evaluation Methods	2
Foreign Countries	2
Guessing (Tests)	2
Item Analysis	2
Multiple Choice Tests	2
Scores	2
Scoring	2
Statistical Analysis	2
Test Length	2
More ▼

Source

ETS Research Report Series	3
International Journal of…	2
Applied Psychological…	1
Assessment & Evaluation in…	1
Educational and Psychological…	1
Intelligence	1
Journal of Educational…	1
Journal of Intelligence	1
Journal of Reading	1
ProQuest LLC	1

Publication Type

Journal Articles	12
Reports - Research	12
Reports - Evaluative	3
Dissertations/Theses -…	1
Numerical/Quantitative Data	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Adult Education	1
Higher Education	1

Audience

Location

France	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Advanced Placement…	1
Graduate Management Admission…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

IRT Linking Methods for the Bifactor Model with Mixed Format Tests

Peer reviewed

Direct link

Sohee Kim; Ki Lynn Cole – International Journal of Testing, 2025

This study conducted a comprehensive comparison of Item Response Theory (IRT) linking methods applied to a bifactor model, examining their performance on both multiple choice (MC) and mixed format tests within the common item nonequivalent group design framework. Four distinct multidimensional IRT linking approaches were explored, consisting of…

Descriptors: Item Response Theory, Comparative Analysis, Models, Item Analysis

Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests

Peer reviewed

Direct link

Huang, Hung-Yu – Educational and Psychological Measurement, 2023

The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs)…

Descriptors: Test Items, Classification, Bayesian Statistics, Decision Making

A Comparison of Strategies for Smoothing Parameter Selection for Mixed-Format Tests under the Random Groups Design

Peer reviewed

Direct link

Liu, Chunyan; Kolen, Michael J. – Journal of Educational Measurement, 2018

Smoothing techniques are designed to improve the accuracy of equating functions. The main purpose of this study is to compare seven model selection strategies for choosing the smoothing parameter (C) for polynomial loglinear presmoothing and one procedure for model selection in cubic spline postsmoothing for mixed-format pseudo tests under the…

Descriptors: Comparative Analysis, Accuracy, Models, Sample Size

Same Test, Better Scores: Boosting the Reliability of Short Online Intelligence Recruitment Tests with Nested Logit Item Response Theory Models

Peer reviewed
PDF on ERIC

Download full text

Storme, Martin; Myszkowski, Nils; Baron, Simon; Bernard, David – Journal of Intelligence, 2019

Assessing job applicants' general mental ability online poses psychometric challenges due to the necessity of having brief but accurate tests. Recent research (Myszkowski & Storme, 2018) suggests that recovering distractor information through Nested Logit Models (NLM; Suh & Bolt, 2010) increases the reliability of ability estimates in…

Descriptors: Intelligence Tests, Item Response Theory, Comparative Analysis, Test Reliability

Reducing the Need for Guesswork in Multiple-Choice Tests

Peer reviewed

Direct link

Bush, Martin – Assessment & Evaluation in Higher Education, 2015

The humble multiple-choice test is very widely used within education at all levels, but its susceptibility to guesswork makes it a suboptimal assessment tool. The reliability of a multiple-choice test is partly governed by the number of items it contains; however, longer tests are more time consuming to take, and for some subject areas, it can be…

Descriptors: Guessing (Tests), Multiple Choice Tests, Test Format, Test Reliability

The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

Peer reviewed
PDF on ERIC

Download full text

Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013

The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…

Descriptors: Test Format, Test Items, Responses, Computation

Assessing First- and Second-Order Equity for the Common-Item Nonequivalent Groups Design Using Multidimensional IRT

Direct link

Andrews, Benjamin James – ProQuest LLC, 2011

The equity properties can be used to assess the quality of an equating. The degree to which expected scores conditional on ability are similar between test forms is referred to as first-order equity. Second-order equity is the degree to which conditional standard errors of measurement are similar between test forms after equating. The purpose of…

Descriptors: Test Format, Advanced Placement, Simulation, True Scores

The Internationalization of Test Reviewing: Trends, Differences, and Results

Peer reviewed

Direct link

Evers, Arne – International Journal of Testing, 2012

In this article, the characteristics of five test review models are described. The five models are the US review system at the Buros Center for Testing, the German Test Review System of the Committee on Tests, the Brazilian System for the Evaluation of Psychological Tests, the European EFPA Review Model, and the Dutch COTAN Evaluation System for…

Descriptors: Program Evaluation, Test Reviews, Trend Analysis, International Education

Distinguishing Verbal, Quantitative, and Figural Facets of Fluid Intelligence in Young Students

Peer reviewed

Direct link

Lakin, Joni M.; Gambrell, James L. – Intelligence, 2012

Measures of broad fluid abilities including verbal, quantitative, and figural reasoning are commonly used in the K-12 school context for a variety of purposes. However, differentiation of these domains is difficult for young children (grades K-2) who lack basic linguistic and mathematical literacy. This study examined the latent factor structure…

Descriptors: Evidence, Validity, Item Response Theory, Numeracy

Studies of a Latent-Class Signal-Detection Model for Constructed-Response Scoring. Research Report. ETS RR-08-63

Peer reviewed
PDF on ERIC

Download full text

DeCarlo, Lawrence T. – ETS Research Report Series, 2008

Rater behavior in essay grading can be viewed as a signal-detection task, in that raters attempt to discriminate between latent classes of essays, with the latent classes being defined by a scoring rubric. The present report examines basic aspects of an approach to constructed-response (CR) scoring via a latent-class signal-detection model. The…

Descriptors: Scoring, Responses, Test Format, Bias

An Empirical Investigation of One Variable Section Pre-Equating.

Download full text

Wightman, Linda F.; Wightman, Lawrence E. – 1988

Section Pre-Equating (SPE) is a method used to equate test forms that consist of multiple separately timed sections. SPE does not require examinees to take two complete forms of the test. Instead, all of the old form and one or two sections of the new form are administered to each examinee, and missing data techniques are employed to estimate the…

Descriptors: Comparative Analysis, Correlation, Equated Scores, Estimation (Mathematics)

Computerized Adaptive Testing for Polytomous Motivation Items: Administration Mode Effects and a Comparison with Short Forms

Peer reviewed

Direct link

Hol, A. Michiel; Vorst, Harrie C. M.; Mellenbergh, Gideon J. – Applied Psychological Measurement, 2007

In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible…

Descriptors: Student Motivation, Simulation, Adaptive Testing, Computer Assisted Testing

Comparison of Multistage Tests with Computerized Adaptive and Paper-and-Pencil Tests. Research Report. ETS RR-07-04

Peer reviewed
PDF on ERIC

Download full text

Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007

Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…

Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

Usefulness of Conventional vs. Total Random Cued Cloze Tests as Measures of Reading Comprehension.

Peer reviewed

Helfeldt, John P.; Henk, William A. – Journal of Reading, 1985

Reports on a study that found an alternative cloze test format both valid and practical when compared to the conventional cloze format. (HOD)

Descriptors: Cloze Procedure, Comparative Analysis, Measurement Techniques, Models

A Quadratic Curve Equating Method To Equate the First Three Moments in Equipercentile Equating.

Download full text

Wang, Tianyou; Kolen, Michael J. – 1994

In this paper a quadratic curve equating method for different test forms under a random-group data-collection design is proposed. Procedures for implementing this method and related issues are described and discussed. The quadratic-curve method was evaluated with real test data (from two 30-item subtests for a professional licensure examination…

Descriptors: Comparative Analysis, Data Collection, Equated Scores, Goodness of Fit

Previous Page | Next Page »

Pages: 1 | 2

Kolen, Michael J.	2
Wainer, Howard	2
Andrews, Benjamin James	1
Baron, Simon	1
Bernard, David	1
Bush, Martin	1
DeCarlo, Lawrence T.	1
Evers, Arne	1
Gambrell, James L.	1
Helfeldt, John P.	1
Henk, William A.	1
Hol, A. Michiel	1
Huang, Hung-Yu	1
Ki Lynn Cole	1
Lakin, Joni M.	1
Liu, Chunyan	1
Mellenbergh, Gideon J.	1
Myszkowski, Nils	1
Patsula, Liane	1
Rizavi, Saba	1
Rotou, Ourania	1
Sohee Kim	1
Steffen, Manfred	1
Storme, Martin	1
More ▼