ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	6
Since 2017 (last 10 years)	17
Since 2007 (last 20 years)	29

Descriptor

Sample Size	38
Test Format	38
Test Items	23
Item Response Theory	18
Comparative Analysis	13
Equated Scores	13
Difficulty Level	12
Test Length	10
Simulation	8
Scores	7
Statistical Analysis	7
Error of Measurement	6
Test Construction	6
Computer Assisted Testing	5
Estimation (Mathematics)	5
Monte Carlo Methods	5
Multiple Choice Tests	5
Test Reliability	5
Accuracy	4
Correlation	4
Adaptive Testing	3
Cutting Scores	3
Decision Making	3
Foreign Countries	3
Item Analysis	3
More ▼

Source

ETS Research Report Series	4
Applied Measurement in…	3
Educational and Psychological…	3
Journal of Educational…	3
ProQuest LLC	3
AERA Online Paper Repository	1
Advances in Physiology…	1
Educational Sciences: Theory…	1
Eurasian Journal of…	1
International Journal of…	1
Journal of Educational and…	1
Journal of Psychoeducational…	1
Language Testing	1
Measurement:…	1
Participatory Educational…	1
Pearson	1
Practical Assessment,…	1
Quality Assurance in…	1
More ▼

Publication Type

Reports - Research	29
Journal Articles	24
Speeches/Meeting Papers	10
Reports - Evaluative	5
Dissertations/Theses -…	3
Numerical/Quantitative Data	1
Reports - Descriptive	1

Education Level

Higher Education	3
Postsecondary Education	3
Elementary Secondary Education	2
Secondary Education	1

Audience

Researchers

Location

Indiana	1
Japan	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Iowa Tests of Basic Skills	1
Medical College Admission Test	1
Program for International…	1
SAT (College Admission Test)	1
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 38 results Save | Export

Effect of Missing Data on Test Equating Methods Under NEAT Design

Peer reviewed
PDF on ERIC

Download full text

Semih Asiret; Seçil Ömür Sünbül – International Journal of Psychology and Educational Studies, 2023

In this study, it was aimed to examine the effect of missing data in different patterns and sizes on test equating methods under the NEAT design for different factors. For this purpose, as part of this study, factors such as sample size, average difficulty level difference between the test forms, difference between the ability distribution,…

Descriptors: Research Problems, Data, Test Items, Equated Scores

Incorporating Test-Taking Engagement into Multistage Adaptive Testing Design for Large-Scale Assessments

Peer reviewed

Direct link

Okan Bulut; Guher Gorgun; Hacer Karamese – Journal of Educational Measurement, 2025

The use of multistage adaptive testing (MST) has gradually increased in large-scale testing programs as MST achieves a balanced compromise between linear test design and item-level adaptive testing. MST works on the premise that each examinee gives their best effort when attempting the items, and their responses truly reflect what they know or can…

Descriptors: Response Style (Tests), Testing Problems, Testing Accommodations, Measurement

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Effect of Statistically Matching Equating Samples for Common-Item Equating. Research Report. ETS RR-21-02

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Kim, Sooyeon – ETS Research Report Series, 2021

This study evaluated the impact of subgroup weighting for equating through a common-item anchor. We used data from a single test form to create two research forms for which the equating relationship was known. The results showed that equating was most accurate when the new form and reference form samples were weighted to be similar to the target…

Descriptors: Equated Scores, Weighted Scores, Raw Scores, Test Items

A Comparison of Common IRT Model-Selection Methods with Mixed-Format Tests

Peer reviewed

Direct link

Luo, Yong – Measurement: Interdisciplinary Research and Perspectives, 2021

To date, only frequentist model-selection methods have been studied with mixed-format data in the context of IRT model-selection, and it is unknown how popular Bayesian model-selection methods such as DIC, WAIC, and LOO perform. In this study, we present the results of a comprehensive simulation study that compared the performances of eight…

Descriptors: Item Response Theory, Test Format, Selection, Methods

Can High-Dimensional Questionnaires Resolve the Ipsativity Issue of Forced-Choice Response Formats?

Peer reviewed

Direct link

Schulte, Niklas; Holling, Heinz; Bürkner, Paul-Christian – Educational and Psychological Measurement, 2021

Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales. However, the derived trait scores are often unreliable and ipsative, making interindividual comparisons in high-stakes situations impossible. Several studies suggest that these problems vanish if the number of measured traits is high.…

Descriptors: Questionnaires, Measurement Techniques, Test Format, Scoring

Examining of Internal Consistency Coefficients in Mixed-Format Tests in Different Simulation Conditions

Peer reviewed
PDF on ERIC

Download full text

Gurdil Ege, Hatice; Demir, Ergul – Eurasian Journal of Educational Research, 2020

Purpose: The present study aims to evaluate how the reliabilities computed using a, Stratified a, Angoff-Feldt, and Feldt-Raju estimators may differ when sample size (500, 1000, and 2000) and item type ratio of dichotomous to polytomous items (2:1; 1:1, 1:2) included in the scale are varied. Research Methods: In this study, Cronbach's a,…

Descriptors: Test Format, Simulation, Test Reliability, Sample Size

Closed Formula of Test Length Required for Adaptive Testing with Medium Probability of Solution

Peer reviewed

Direct link

Kárász, Judit T.; Széll, Krisztián; Takács, Szabolcs – Quality Assurance in Education: An International Perspective, 2023

Purpose: Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in…

Descriptors: Test Length, Probability, Comparative Analysis, Difficulty Level

Nonparametric Classification Method for Multiple-Choice Items in Cognitive Diagnosis

Peer reviewed

Direct link

Wang, Yu; Chiu, Chia-Yi; Köhn, Hans Friedrich – Journal of Educational and Behavioral Statistics, 2023

The multiple-choice (MC) item format has been widely used in educational assessments across diverse content domains. MC items purportedly allow for collecting richer diagnostic information. The effectiveness and economy of administering MC items may have further contributed to their popularity not just in educational assessment. The MC item format…

Descriptors: Multiple Choice Tests, Nonparametric Statistics, Test Format, Educational Assessment

Effect of Item Parameter Drift in Mixed Format Common Items on Test Equating

Peer reviewed
PDF on ERIC

Download full text

Uysal, Ibrahim; Sahin-Kürsad, Merve; Kiliç, Abdullah Faruk – Participatory Educational Research, 2022

The aim of the study was to examine the common items in the mixed format (e.g., multiple-choices and essay items) contain parameter drifts in the test equating processes performed with the common item nonequivalent groups design. In this study, which was carried out using Monte Carlo simulation with a fully crossed design, the factors of test…

Descriptors: Test Items, Test Format, Item Response Theory, Equated Scores

Subscore Equating and Profile Reporting

Peer reviewed

Direct link

Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020

The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…

Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level

Evaluating the Effectiveness of the Expectation-Maximization (EM) Algorithm for Bayesian Network Calibration

Direct link

Tingir, Seyfullah – ProQuest LLC, 2019

Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…

Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability

Effect of Single-Value Response Styles on Latent Factor Model Convergence and Measures of Fit

Peer reviewed

Direct link

Harbaugh, Allen G.; Liu, Min – AERA Online Paper Repository, 2017

This research examines the effects of single-value response style contamination on measures of model fit and model convergence issues. A simulation study examines the effects resulting from percentage of contamination, number of manifest, number of reverse coded items, magnitude of standardized factor loadings, response scale granularity, and…

Descriptors: Goodness of Fit, Sample Size, Statistical Analysis, Test Format

A Comparison of Strategies for Smoothing Parameter Selection for Mixed-Format Tests under the Random Groups Design

Peer reviewed

Direct link

Liu, Chunyan; Kolen, Michael J. – Journal of Educational Measurement, 2018

Smoothing techniques are designed to improve the accuracy of equating functions. The main purpose of this study is to compare seven model selection strategies for choosing the smoothing parameter (C) for polynomial loglinear presmoothing and one procedure for model selection in cubic spline postsmoothing for mixed-format pseudo tests under the…

Descriptors: Comparative Analysis, Accuracy, Models, Sample Size

Using Confidence Intervals to Determine Adequate Item Sample Sizes for Vocabulary Tests: An Essential but Overlooked Practice

Peer reviewed

Direct link

Gyllstad, Henrik; McLean, Stuart; Stewart, Jeffrey – Language Testing, 2021

The last three decades have seen an increase of tests aimed at measuring an individual's vocabulary level or size. The target words used in these tests are typically sampled from word frequency lists, which are in turn based on language corpora. Conventionally, test developers sample items from frequency bands of 1000 words; different tests employ…

Descriptors: Vocabulary Development, Sample Size, Language Tests, Test Items

Previous Page | Next Page »

Pages: 1 | 2 | 3

Kim, Sooyeon	2
Lee, Won-Chan	2
Pommerich, Mary	2
Allen, Nancy L.	1
Ansley, Timothy N.	1
Braun, Mark W.	1
Brooks, Thomas	1
Bürkner, Paul-Christian	1
Carlson, Alfred B.	1
Chiu, Chia-Yi	1
Chon, Kyong Hee	1
De Champlain, Andre	1
DeMars, Christine E.	1
Demir, Ergul	1
Donoghue, John R.	1
Eckerly, Carol	1
Floyd, Harlee S.	1
Gessaroli, Marc E.	1
Goodman, Joshua	1
Grant, Mary	1
Guher Gorgun	1
Gurdil Ege, Hatice	1
Gyllstad, Henrik	1
Haberman, Shelby	1
Hacer Karamese	1
More ▼