Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 19 |
Descriptor
Source
Author
Publication Type
Reports - Evaluative | 53 |
Journal Articles | 31 |
Speeches/Meeting Papers | 10 |
Numerical/Quantitative Data | 2 |
Guides - General | 1 |
Reports - Research | 1 |
Education Level
Elementary Secondary Education | 4 |
Grade 8 | 4 |
Secondary Education | 3 |
Elementary Education | 2 |
Grade 4 | 2 |
Grade 5 | 2 |
High Schools | 2 |
Middle Schools | 2 |
Grade 1 | 1 |
Grade 11 | 1 |
Grade 2 | 1 |
More ▼ |
Audience
Location
Australia | 1 |
Netherlands | 1 |
Oregon | 1 |
South Korea | 1 |
Texas | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Lawrence T. DeCarlo – Educational and Psychological Measurement, 2024
A psychological framework for different types of items commonly used with mixed-format exams is proposed. A choice model based on signal detection theory (SDT) is used for multiple-choice (MC) items, whereas an item response theory (IRT) model is used for open-ended (OE) items. The SDT and IRT models are shown to share a common conceptualization…
Descriptors: Test Format, Multiple Choice Tests, Item Response Theory, Models
Gerring, John; Pemstein, Daniel; Skaaning, Svend-Erik – Sociological Methods & Research, 2021
A key obstacle to measurement is the aggregation problem. Where indicators tap into common latent traits in theoretically meaningful ways, the problem may be solved by applying a data-informed ("inductive") measurement model, for example, factor analysis, structural equation models, or item response theory. Where they do not, researchers…
Descriptors: Test Construction, Measures (Individuals), Concept Formation, Social Science Research
Wolkowitz, Amanda A.; Foley, Brett; Zurn, Jared – Practical Assessment, Research & Evaluation, 2023
The purpose of this study is to introduce a method for converting scored 4-option multiple-choice (MC) items into scored 3-option MC items without re-pretesting the 3-option MC items. This study describes a six-step process for achieving this goal. Data from a professional credentialing exam was used in this study and the method was applied to 24…
Descriptors: Multiple Choice Tests, Test Items, Accuracy, Test Format
Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…
Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests
Martin, Michael O., Ed.; von Davier, Matthias, Ed.; Mullis, Ina V. S., Ed. – International Association for the Evaluation of Educational Achievement, 2020
The chapters in this online volume comprise the TIMSS & PIRLS International Study Center's technical report of the methods and procedures used to develop, implement, and report the results of TIMSS 2019. There were various technical challenges because TIMSS 2019 was the initial phase of the transition to eTIMSS, with approximately half the…
Descriptors: Foreign Countries, Elementary Secondary Education, Achievement Tests, International Assessment
Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement
Pae, Tae-Il – Language Testing, 2012
This study tracked gender differential item functioning (DIF) on the English subtest of the Korean College Scholastic Aptitude Test (KCSAT) over a nine-year period across three data points, using both the Mantel-Haenszel (MH) and item response theory likelihood ratio (IRT-LR) procedures. Further, the study identified two factors (i.e. reading…
Descriptors: Aptitude Tests, Academic Aptitude, Language Tests, Test Items
Filipi, Anna – Language Testing, 2012
The Assessment of Language Competence (ALC) certificates is an annual, international testing program developed by the Australian Council for Educational Research to test the listening and reading comprehension skills of lower to middle year levels of secondary school. The tests are developed for three levels in French, German, Italian and…
Descriptors: Listening Comprehension Tests, Item Response Theory, Statistical Analysis, Foreign Countries
Webb, Mi-young Lee; Cohen, Allan S.; Schwanenflugel, Paula J. – Educational and Psychological Measurement, 2008
This study investigated the use of latent class analysis for the detection of differences in item functioning on the Peabody Picture Vocabulary Test-Third Edition (PPVT-III). A two-class solution for a latent class model appeared to be defined in part by ability because Class 1 was lower in ability than Class 2 on both the PPVT-III and the…
Descriptors: Item Response Theory, Test Items, Test Format, Cognitive Ability
Einarsdottir, Sif; Rounds, James – Journal of Vocational Behavior, 2009
Item response theory was used to address gender bias in interest measurement. Differential item functioning (DIF) technique, SIBTEST and DIMTEST for dimensionality, were applied to the items of the six General Occupational Theme (GOT) and 25 Basic Interest (BI) scales in the Strong Interest Inventory. A sample of 1860 women and 1105 men was used.…
Descriptors: Test Format, Females, Vocational Interests, Construct Validity
Judd, Wallace – Practical Assessment, Research & Evaluation, 2009
Over the past twenty years in performance testing a specific item type with distinguishing characteristics has arisen time and time again. It's been invented independently by dozens of test development teams. And yet this item type is not recognized in the research literature. This article is an invitation to investigate the item type, evaluate…
Descriptors: Test Items, Test Format, Evaluation, Item Analysis
Keng, Leslie; McClarty, Katie Larsen; Davis, Laurie Laughlin – Applied Measurement in Education, 2008
This article describes a comparative study conducted at the item level for paper and online administrations of a statewide high stakes assessment. The goal was to identify characteristics of items that may have contributed to mode effects. Item-level analyses compared two modes of the Texas Assessment of Knowledge and Skills (TAKS) for up to four…
Descriptors: Computer Assisted Testing, Geometric Concepts, Grade 8, Comparative Analysis
Dorans, Neil J.; Liu, Jinghua; Hammond, Shelby – Applied Psychological Measurement, 2008
This exploratory study was built on research spanning three decades. Petersen, Marco, and Stewart (1982) conducted a major empirical investigation of the efficacy of different equating methods. The studies reported in Dorans (1990) examined how different equating methods performed across samples selected in different ways. Recent population…
Descriptors: Test Format, Equated Scores, Sampling, Evaluation Methods
Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen – Applied Measurement in Education, 2008
In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…
Descriptors: Test Format, Measurement Techniques, Equations (Mathematics), Item Response Theory
von Davier, Alina A.; Wilson, Christine – Applied Psychological Measurement, 2008
Dorans and Holland (2000) and von Davier, Holland, and Thayer (2003) introduced measures of the degree to which an observed-score equating function is sensitive to the population on which it is computed. This article extends the findings of Dorans and Holland and of von Davier et al. to item response theory (IRT) true-score equating methods that…
Descriptors: Advanced Placement, Advanced Placement Programs, Equated Scores, Calculus