ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	8

Descriptor

Difficulty Level	16
Error of Measurement	16
Item Analysis	16
Test Items	13
Comparative Analysis	6
Test Construction	6
Item Response Theory	5
Achievement Tests	4
Foreign Countries	4
Latent Trait Theory	4
Mathematical Models	4
Simulation	4
Factor Analysis	3
Item Banks	3
Sample Size	3
Test Reliability	3
Accuracy	2
Adaptive Testing	2
Classification	2
College Entrance Examinations	2
Computer Assisted Testing	2
Correlation	2
Goodness of Fit	2
High Schools	2
Monte Carlo Methods	2
More ▼

Source

Educational and Psychological…	2
Journal of Educational…	2
Applied Measurement in…	1
ETS Research Report Series	1
Evaluation and the Health…	1
Language Testing	1
Research Papers in Education	1

Publication Type

Reports - Research	13
Journal Articles	9
Speeches/Meeting Papers	3
Reports - Evaluative	2
Information Analyses	1
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Secondary Education	2
Elementary Education	1
Higher Education	1
Postsecondary Education	1

Audience

Researchers

Location

Chile	1
Japan	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Medical College Admission Test	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Examining the Precision of Cut Scores within a Generalizability Theory Framework: A Closer Look at the Item Effect

Peer reviewed

Direct link

Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020

An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…

Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting

Differential Item Functioning Effect Size from the Multigroup Confirmatory Factor Analysis for a Meta-Analysis: A Simulation Study

Peer reviewed

Direct link

Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021

This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…

Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods

Position of Correct Option and Distractors Impacts Responses to Multiple-Choice Items: Evidence from a National Test

Peer reviewed

Direct link

Lions, Séverin; Dartnell, Pablo; Toledo, Gabriela; Godoy, María Inés; Córdova, Nora; Jiménez, Daniela; Lemarié, Julie – Educational and Psychological Measurement, 2023

Even though the impact of the position of response options on answers to multiple-choice items has been investigated for decades, it remains debated. Research on this topic is inconclusive, perhaps because too few studies have obtained experimental data from large-sized samples in a real-world context and have manipulated the position of both…

Descriptors: Multiple Choice Tests, Test Items, Item Analysis, Responses

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

Monitoring Items in Real Time to Enhance CAT Security

Peer reviewed

Direct link

Zhang, Jinming; Li, Jie – Journal of Educational Measurement, 2016

An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed…

Descriptors: Computer Assisted Testing, Test Items, Difficulty Level, Item Response Theory

Self-Assessment of Japanese as a Second Language: The Role of Experiences in the Naturalistic Acquisition

Peer reviewed

Direct link

Suzuki, Yuichi – Language Testing, 2015

Self-assessment has been used to assess second language proficiency; however, as sources of measurement errors vary, they may threaten the validity and reliability of the tools. The present paper investigated the role of experiences in using Japanese as a second language in the naturalistic acquisition context on the accuracy of the…

Descriptors: Self Evaluation (Individuals), Error of Measurement, Japanese, Second Language Learning

An Investigation of Measurement Invariance of the Key Stage 2 National Curriculum Science Sampling Test in England

Peer reviewed

Direct link

He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014

Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…

Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis

The Application of Strength of Association Statistics to the Item Analysis of an In-Training Examination in Diagnostic Radiology.

Peer reviewed

Diamond, James J.; McCormick, Janet – Evaluation and the Health Professions, 1986

Using item responses from an in-training examination in diagnostic radiology, the application of a strength of association statistic to the general problem of item analysis is illustrated. Criteria for item selection, general issues of reliability, and error of measurement are discussed. (Author/LMO)

Descriptors: Achievement Tests, Difficulty Level, Error of Measurement, Graduate Medical Education

A Multivariate Generalizability Analysis of the 1989 and 1990 AAP Mathematics Test Forms with Respect to the Table of Specifications.

Download full text

Colton, Dean A. – 1993

Tables of specifications are used to guide test developers in sampling items and maintaining consistency from form to form. This paper is a generalizability study of the American College Testing Program (ACT) Achievement Program Mathematics Test (AAP), with the content areas of the table of specifications representing multiple dependent variables.…

Descriptors: Achievement Tests, Difficulty Level, Error of Measurement, Generalizability Theory

Latent Trait Models for Partially Speeded Tests.

Wise, Lauress L. – 1986

A primary goal of this study was to determine the extent to which item difficulty was related to item position and, if a significant relationship was found, to suggest adjustments to predicted item difficulty that reflect differences in item position. Item response data from the Medical College Admission Test (MCAT) were analyzed. A data set was…

Descriptors: College Entrance Examinations, Difficulty Level, Educational Research, Error of Measurement

Invariance of Rasch Model Ability Parameter Estimates Over Different Collections of Items.

Curry, Allen R.; And Others – 1978

The efficacy of employing subsets of items from a calibrated item pool to estimate the Rasch model person parameters was investigated. Specifically, the degree of invariance of Rasch model ability-parameter estimates was examined across differing collections of simulated items. The ability-parameter estimates were obtained from a simulation of…

Descriptors: Career Development, Difficulty Level, Equated Scores, Error of Measurement

A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.

Benson, Jeri; Wilson, Michael – 1979

Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…

Descriptors: Comparative Analysis, Difficulty Level, Efficiency, Error of Measurement

An Application of Latent Trait Test Methodology to a Large School District Testing Program.

Ridgeway, Gretchen Freiheit – 1982

A one-parameter latent trait model was the basis of the test development procedures in the Basic Skills Assessment Program (BSAP) of the Department of Defense Dependents Schools (DoDDS). Several issues are involved in applying the Rasch model to an assessment program in a large school district. Separate sets of skills continua are arranged by…

Descriptors: Achievement Tests, Basic Skills, Dependents Schools, Difficulty Level

A Comparison of Several Multiple-Choice, Linguistic-Based Item Writing Algorithms.

Roid, Gale; Haladyna, Tom – 1978

The technology of transforming sentences from prose instruction into test questions was examined by comparing two methods of selecting sentences (keyword vs. rare singleton), two types of question words (nouns vs. adjectives), and two foil construction methods (writer's choice vs. algorithmic). Four item writers created items using each…

Descriptors: Algorithms, Cloze Procedure, Comparative Analysis, Criterion Referenced Tests

Previous Page | Next Page »

Pages: 1 | 2

Abulela, Mohammed A. A.	1
Ahn, Soyeon	1
Anwyll, Steve	1
Benson, Jeri	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
Colton, Dean A.	1
Curry, Allen R.	1
Córdova, Nora	1
Dartnell, Pablo	1
Diamond, James J.	1
Dorans, Neil J.	1
Glanville, Matthew	1
Godoy, María Inés	1
Guo, Hongwen	1
Haladyna, Tom	1
He, Qingping	1
Jiménez, Daniela	1
Kane, Michael	1
Lemarié, Julie	1
Li, Jie	1
Lions, Séverin	1
Lu, Ru	1
McCormick, Janet	1
More ▼