Publication Date
In 2025 | 0 |
Since 2024 | 17 |
Since 2021 (last 5 years) | 40 |
Since 2016 (last 10 years) | 80 |
Since 2006 (last 20 years) | 134 |
Descriptor
Item Response Theory | 169 |
Test Items | 103 |
Equated Scores | 34 |
Comparative Analysis | 33 |
Simulation | 33 |
Responses | 31 |
Scores | 30 |
Statistical Analysis | 27 |
Test Construction | 27 |
Evaluation Methods | 26 |
Error of Measurement | 25 |
More ▼ |
Source
Applied Measurement in… | 207 |
Author
Lee, Won-Chan | 8 |
Wise, Steven L. | 7 |
Kolen, Michael J. | 6 |
Wells, Craig S. | 5 |
Finch, Holmes | 4 |
Lane, Suzanne | 4 |
DeMars, Christine E. | 3 |
Engelhard, George, Jr. | 3 |
Hambleton, Ronald K. | 3 |
Meijer, Rob R. | 3 |
Rios, Joseph A. | 3 |
More ▼ |
Publication Type
Journal Articles | 207 |
Reports - Research | 143 |
Reports - Evaluative | 58 |
Speeches/Meeting Papers | 9 |
Information Analyses | 5 |
Reports - Descriptive | 4 |
Tests/Questionnaires | 4 |
Opinion Papers | 1 |
Education Level
Secondary Education | 22 |
Elementary Education | 19 |
Higher Education | 18 |
Postsecondary Education | 15 |
Middle Schools | 14 |
Grade 8 | 13 |
Junior High Schools | 13 |
Grade 7 | 10 |
Elementary Secondary Education | 9 |
Grade 3 | 8 |
Grade 4 | 8 |
More ▼ |
Audience
Practitioners | 2 |
Researchers | 2 |
Location
Australia | 2 |
Belgium | 2 |
California | 2 |
Florida | 2 |
Netherlands | 2 |
Texas | 2 |
Canada | 1 |
Colorado | 1 |
Finland | 1 |
Germany | 1 |
Iowa | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Stefanie A. Wind; Beyza Aksu-Dunya – Applied Measurement in Education, 2024
Careless responding is a pervasive concern in research using affective surveys. Although researchers have considered various methods for identifying careless responses, studies are limited that consider the utility of these methods in the context of computer adaptive testing (CAT) for affective scales. Using a simulation study informed by recent…
Descriptors: Response Style (Tests), Computer Assisted Testing, Adaptive Testing, Affective Measures
Yue Liu; Zhen Li; Hongyun Liu; Xiaofeng You – Applied Measurement in Education, 2024
Low test-taking effort of examinees has been considered a source of construct-irrelevant variance in item response modeling, leading to serious consequences on parameter estimation. This study aims to investigate how non-effortful response (NER) influences the estimation of item and person parameters in item-pool scale linking (IPSL) and whether…
Descriptors: Item Response Theory, Computation, Simulation, Responses
Chunyan Liu; Raja Subhiyah; Richard A. Feinberg – Applied Measurement in Education, 2024
Mixed-format tests that include both multiple-choice (MC) and constructed-response (CR) items have become widely used in many large-scale assessments. When an item response theory (IRT) model is used to score a mixed-format test, the unidimensionality assumption may be violated if the CR items measure a different construct from that measured by MC…
Descriptors: Test Format, Response Style (Tests), Multiple Choice Tests, Item Response Theory
Jianbin Fu; Xuan Tan; Patrick C. Kyllonen – Applied Measurement in Education, 2024
A process is proposed to create the one-dimensional expected item characteristic curve (ICC) and test characteristic curve (TCC) for each trait in multidimensional forced-choice questionnaires based on the Rank-2PL (two-parameter logistic) item response theory models for forced-choice items with two or three statements. Some examples of ICC and…
Descriptors: Item Response Theory, Questionnaires, Measurement Techniques, Statistics
Séverin Lions; María Paz Blanco; Pablo Dartnell; Carlos Monsalve; Gabriel Ortega; Julie Lemarié – Applied Measurement in Education, 2024
Multiple-choice items are universally used in formal education. Since they should assess learning, not test-wiseness or guesswork, they must be constructed following the highest possible standards. Hundreds of item-writing guides have provided guidelines to help test developers adopt appropriate strategies to define the distribution and sequence…
Descriptors: Test Construction, Multiple Choice Tests, Guidelines, Test Items
Stefanie A. Wind; Benjamin Lugu – Applied Measurement in Education, 2024
Researchers who use measurement models for evaluation purposes often select models with stringent requirements, such as Rasch models, which are parametric. Mokken Scale Analysis (MSA) offers a theory-driven nonparametric modeling approach that may be more appropriate for some measurement applications. Researchers have discussed using MSA as a…
Descriptors: Item Response Theory, Data Analysis, Simulation, Nonparametric Statistics
Brian E. Clauser; Victoria Yaneva; Peter Baldwin; Le An Ha; Janet Mee – Applied Measurement in Education, 2024
Multiple-choice questions have become ubiquitous in educational measurement because the format allows for efficient and accurate scoring. Nonetheless, there remains continued interest in constructed-response formats. This interest has driven efforts to develop computer-based scoring procedures that can accurately and efficiently score these items.…
Descriptors: Computer Uses in Education, Artificial Intelligence, Scoring, Responses
Mingfeng Xue; Mark Wilson – Applied Measurement in Education, 2024
Multidimensionality is common in psychological and educational measurements. This study focuses on dimensions that converge at the upper anchor (i.e. the highest acquisition status defined in a learning progression) and compares different ways of dealing with them using the multidimensional random coefficients multinomial logit model and scale…
Descriptors: Learning Trajectories, Educational Assessment, Item Response Theory, Evolution
Marcelo Andrade da Silva; A. Corinne Huggins-Manley; Jorge Luis Bazán; Amber Benedict – Applied Measurement in Education, 2024
A Q-matrix is a binary matrix that defines the relationship between items and latent variables and is widely used in diagnostic classification models (DCMs), and can also be adopted in multidimensional item response theory (MIRT) models. The construction process of the Q-matrix is typically carried out by experts in the subject area of the items…
Descriptors: Q Methodology, Matrices, Item Response Theory, Educational Assessment
Sarah Alahmadi; Christine E. DeMars – Applied Measurement in Education, 2024
Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote testing conditions. To remove the effects of low effort levels in responses observed in remote low-stakes testing, several motivation filtering methods…
Descriptors: Multiple Choice Tests, Item Response Theory, College Students, Scores
Abbakumov, Dmitry; Desmet, Piet; Van den Noortgate, Wim – Applied Measurement in Education, 2020
Formative assessments are an important component of massive open online courses (MOOCs), online courses with open access and unlimited student participation. Accurate conclusions on students' proficiency via formative, however, face several challenges: (a) students are typically allowed to make several attempts; and (b) student performance might…
Descriptors: Item Response Theory, Formative Evaluation, Online Courses, Response Style (Tests)
Bayesian Logistic Regression: A New Method to Calibrate Pretest Items in Multistage Adaptive Testing
TsungHan Ho – Applied Measurement in Education, 2023
An operational multistage adaptive test (MST) requires the development of a large item bank and the effort to continuously replenish the item bank due to concerns about test security and validity over the long term. New items should be pretested and linked to the item bank before being used operationally. The linking item volume fluctuations in…
Descriptors: Bayesian Statistics, Regression (Statistics), Test Items, Pretesting
Perkins, Beth A.; Pastor, Dena A.; Finney, Sara J. – Applied Measurement in Education, 2021
When tests are low stakes for examinees, meaning there are little to no personal consequences associated with test results, some examinees put little effort into their performance. To understand the causes and consequences of diminished effort, researchers correlate test-taking effort with other variables, such as test-taking emotions and test…
Descriptors: Response Style (Tests), Psychological Patterns, Testing, Differences
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Chalmers, R. Philip; Zheng, Guoguo – Applied Measurement in Education, 2023
This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To…
Descriptors: Test Bias, Test Items, Item Response Theory, Error of Measurement