Publication Date
In 2025 | 17 |
Since 2024 | 73 |
Since 2021 (last 5 years) | 278 |
Since 2016 (last 10 years) | 509 |
Since 2006 (last 20 years) | 827 |
Descriptor
Item Analysis | 1478 |
Test Items | 1478 |
Test Construction | 477 |
Foreign Countries | 378 |
Difficulty Level | 369 |
Test Validity | 295 |
Item Response Theory | 264 |
Test Reliability | 243 |
Multiple Choice Tests | 236 |
Comparative Analysis | 227 |
Scores | 202 |
More ▼ |
Source
Author
Reckase, Mark D. | 16 |
Tindal, Gerald | 13 |
Hambleton, Ronald K. | 12 |
Alonzo, Julie | 10 |
Plake, Barbara S. | 9 |
Dorans, Neil J. | 8 |
Weiss, David J. | 8 |
Gierl, Mark J. | 7 |
Lai, Cheng Fei | 7 |
Lord, Frederic M. | 6 |
McKinley, Robert L. | 6 |
More ▼ |
Publication Type
Education Level
Location
Turkey | 42 |
Canada | 24 |
Australia | 21 |
Iran | 20 |
Japan | 18 |
Germany | 16 |
China | 15 |
United States | 14 |
Taiwan | 11 |
Indonesia | 10 |
Oregon | 10 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 9 |
No Child Left Behind Act 2001 | 6 |
Elementary and Secondary… | 2 |
Every Student Succeeds Act… | 2 |
Education Consolidation… | 1 |
National Defense Education Act | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Does not meet standards | 1 |
Miguel A. García-Pérez – Educational and Psychological Measurement, 2024
A recurring question regarding Likert items is whether the discrete steps that this response format allows represent constant increments along the underlying continuum. This question appears unsolvable because Likert responses carry no direct information to this effect. Yet, any item administered in Likert format can identically be administered…
Descriptors: Likert Scales, Test Construction, Test Items, Item Analysis
Sohee Kim; Ki Lynn Cole – International Journal of Testing, 2025
This study conducted a comprehensive comparison of Item Response Theory (IRT) linking methods applied to a bifactor model, examining their performance on both multiple choice (MC) and mixed format tests within the common item nonequivalent group design framework. Four distinct multidimensional IRT linking approaches were explored, consisting of…
Descriptors: Item Response Theory, Comparative Analysis, Models, Item Analysis
Chan Zhang; Shuaiying Cao; Minglei Wang; Jiangyan Wang; Lirui He – Field Methods, 2025
Previous research on grid questions has mostly focused on their comparability with the item-by-item method and the use of shading to help respondents navigate through a grid. This study extends prior work by examining whether lexical similarity among grid items affects how respondents answer the questions in an experiment where we manipulated…
Descriptors: Foreign Countries, Surveys, Test Construction, Design
Xiaowen Liu – International Journal of Testing, 2024
Differential item functioning (DIF) often arises from multiple sources. Within the context of multidimensional item response theory, this study examined DIF items with varying secondary dimensions using the three DIF methods: SIBTEST, Mantel-Haenszel, and logistic regression. The effect of the number of secondary dimensions on DIF detection rates…
Descriptors: Item Analysis, Test Items, Item Response Theory, Correlation
Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025
This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…
Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis
Martijn Schoenmakers; Jesper Tijmstra; Jeroen Vermunt; Maria Bolsinova – Educational and Psychological Measurement, 2024
Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these…
Descriptors: Item Response Theory, Response Style (Tests), Models, Likert Scales
Hauke Hermann; Annemieke Witte; Gloria Kempelmann; Brian F. Barrett; Sandra Zaal; Jolanda Vonk; Filip Morisse; Anna Pöhlmann; Paula S. Sterkenburg; Tanja Sappok – Journal of Applied Research in Intellectual Disabilities, 2024
Background: Valid and reliable instruments for measuring emotional development are critical for a proper diagnostic assignment in individuals with intellectual disabilities. This exploratory study examined the psychometric properties of the items on the Scale of Emotional Development--Short (SED-S). Method: The sample included 612 adults with…
Descriptors: Measures (Individuals), Emotional Development, Intellectual Disability, Psychometrics
Hongwen Guo; Matthew S. Johnson; Daniel F. McCaffrey; Lixong Gu – ETS Research Report Series, 2024
The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test-taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies…
Descriptors: Test Items, Test Construction, Sample Size, Scaling
Gyamfi, Abraham; Acquaye, Rosemary – Acta Educationis Generalis, 2023
Introduction: Item response theory (IRT) has received much attention in validation of assessment instrument because it allows the estimation of students' ability from any set of the items. Item response theory allows the difficulty and discrimination levels of each item on the test to be estimated. In the framework of IRT, item characteristics are…
Descriptors: Item Response Theory, Models, Test Items, Difficulty Level
Zhang, Susu; Li, Anqi; Wang, Shiyu – Educational Measurement: Issues and Practice, 2023
In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and…
Descriptors: Computer Assisted Testing, Test Construction, Test Wiseness, Test Items
Stephanie M. Bell; R. Philip Chalmers; David B. Flora – Educational and Psychological Measurement, 2024
Coefficient omega indices are model-based composite reliability estimates that have become increasingly popular. A coefficient omega index estimates how reliably an observed composite score measures a target construct as represented by a factor in a factor-analysis model; as such, the accuracy of omega estimates is likely to depend on correct…
Descriptors: Influences, Models, Measurement Techniques, Reliability
Hwanggyu Lim; Kyung T. Han – Educational Measurement: Issues and Practice, 2024
Computerized adaptive testing (CAT) has gained deserved popularity in the administration of educational and professional assessments, but continues to face test security challenges. To ensure sustained quality assurance and testing integrity, it is imperative to establish and maintain multiple stable item pools that are consistent in terms of…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Item Banks
Zachary K. Collier; Minji Kong; Olushola Soyoye; Kamal Chawla; Ann M. Aviles; Yasser Payne – Journal of Educational and Behavioral Statistics, 2024
Asymmetric Likert-type items in research studies can present several challenges in data analysis, particularly concerning missing data. These items are often characterized by a skewed scaling, where either there is no neutral response option or an unequal number of possible positive and negative responses. The use of conventional techniques, such…
Descriptors: Likert Scales, Test Items, Item Analysis, Evaluation Methods
Youmi Suk; Kyung T. Han – Journal of Educational and Behavioral Statistics, 2024
As algorithmic decision making is increasingly deployed in every walk of life, many researchers have raised concerns about fairness-related bias from such algorithms. But there is little research on harnessing psychometric methods to uncover potential discriminatory bias inside decision-making algorithms. The main goal of this article is to…
Descriptors: Psychometrics, Ethics, Decision Making, Algorithms
Justin L. Kern – Journal of Educational and Behavioral Statistics, 2024
Given the frequent presence of slipping and guessing in item responses, models for the inclusion of their effects are highly important. Unfortunately, the most common model for their inclusion, the four-parameter item response theory model, potentially has severe deficiencies related to its possible unidentifiability. With this issue in mind, the…
Descriptors: Item Response Theory, Models, Bayesian Statistics, Generalization