Publication Date
| In 2026 | 0 |
| Since 2025 | 13 |
| Since 2022 (last 5 years) | 69 |
| Since 2017 (last 10 years) | 225 |
| Since 2007 (last 20 years) | 463 |
Descriptor
| Difficulty Level | 584 |
| Item Response Theory | 584 |
| Test Items | 460 |
| Foreign Countries | 162 |
| Test Construction | 110 |
| Psychometrics | 98 |
| Models | 97 |
| Item Analysis | 91 |
| Comparative Analysis | 89 |
| Test Reliability | 80 |
| Multiple Choice Tests | 79 |
| More ▼ | |
Source
Author
| Tindal, Gerald | 16 |
| Alonzo, Julie | 12 |
| Anderson, Daniel | 9 |
| Park, Bitnara Jasmine | 8 |
| Paek, Insu | 7 |
| Irvin, P. Shawn | 6 |
| Petscher, Yaacov | 6 |
| Saven, Jessica L. | 6 |
| Schoen, Robert C. | 6 |
| Bulut, Okan | 5 |
| DeBoer, George E. | 5 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 1 |
Location
| Turkey | 18 |
| Germany | 14 |
| Indonesia | 14 |
| Taiwan | 9 |
| United States | 9 |
| Australia | 8 |
| Nigeria | 8 |
| Canada | 7 |
| Florida | 7 |
| Japan | 6 |
| South Africa | 6 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Christina Glasauer; Martin K. Yeh; Lois Anne DeLong; Yu Yan; Yanyan Zhuang – Computer Science Education, 2025
Background and Context: Feedback on one's progress is essential to new programming language learners, particularly in out-of-classroom settings. Though many study materials offer assessment mechanisms, most do not examine the accuracy of the feedback they deliver, nor give evidence on its validity. Objective: We investigate the potential use of a…
Descriptors: Novices, Computer Science Education, Programming, Accuracy
Yun-Kyung Kim; Li Cai – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2025
This paper introduces an application of cross-classified item response theory (IRT) modeling to an assessment utilizing the embedded standard setting (ESS) method (Lewis & Cook). The cross-classified IRT model is used to treat both item and person effects as random, where the item effects are regressed on the target performance levels (target…
Descriptors: Standard Setting (Scoring), Item Response Theory, Test Items, Difficulty Level
Aiman Mohammad Freihat; Omar Saleh Bani Yassin – Educational Process: International Journal, 2025
Background/purpose: This study aimed to reveal the accuracy of estimation of multiple-choice test items parameters following the models of the item-response theory in measurement. Materials/methods: The researchers depended on the measurement accuracy indicators, which express the absolute difference between the estimated and actual values of the…
Descriptors: Accuracy, Computation, Multiple Choice Tests, Test Items
Zenger, Tim; Bitzenbauer, Philipp – Science Education International, 2022
This article reports on the development and piloting of a German version of a concept test to assess students' conceptual knowledge of density. The concept test was administered in paper-pencil format to 222 German secondary school students as a post-test after instruction in all relevant concepts of density. We provide a psychometric…
Descriptors: Foreign Countries, Secondary School Students, Concept Formation, Psychometrics
Saatcioglu, Fatima Munevver; Atar, Hakan Yavuz – International Journal of Assessment Tools in Education, 2022
This study aims to examine the effects of mixture item response theory (IRT) models on item parameter estimation and classification accuracy under different conditions. The manipulated variables of the simulation study are set as mixture IRT models (Rasch, 2PL, 3PL); sample size (600, 1000); the number of items (10, 30); the number of latent…
Descriptors: Accuracy, Classification, Item Response Theory, Programming Languages
Kuan-Yu Jin; Thomas Eckes – Educational and Psychological Measurement, 2024
Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable…
Descriptors: Item Response Theory, Test Items, Test Wiseness, Surveys
Gyamfi, Abraham; Acquaye, Rosemary – Acta Educationis Generalis, 2023
Introduction: Item response theory (IRT) has received much attention in validation of assessment instrument because it allows the estimation of students' ability from any set of the items. Item response theory allows the difficulty and discrimination levels of each item on the test to be estimated. In the framework of IRT, item characteristics are…
Descriptors: Item Response Theory, Models, Test Items, Difficulty Level
Hauenstein, Clifford E.; Embretson, Susan E. – Journal of Cognitive Education and Psychology, 2020
The Concept Formation subtest of the Woodcock Johnson Tests of Cognitive Abilities represents a dynamic test due to continual provision of feedback from examiner to examinee. Yet, the original scoring protocol for the test largely ignores this dynamic structure. The current analysis applies a dynamic adaptation of an explanatory item response…
Descriptors: Test Items, Difficulty Level, Cognitive Tests, Cognitive Ability
Sweeney, Sandra M.; Sinharay, Sandip; Johnson, Matthew S.; Steinhauer, Eric W. – Educational Measurement: Issues and Practice, 2022
The focus of this paper is on the empirical relationship between item difficulty and item discrimination. Two studies--an empirical investigation and a simulation study--were conducted to examine the association between item difficulty and item discrimination under classical test theory and item response theory (IRT), and the effects of the…
Descriptors: Correlation, Item Response Theory, Item Analysis, Difficulty Level
Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025
This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…
Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests
Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024
This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…
Descriptors: Korean, Test Validity, Test Reliability, Imitation
Xue, Kang; Huggins-Manley, Anne Corinne; Leite, Walter – Educational and Psychological Measurement, 2022
In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rely on unbiased item parameter estimates associated with test items in the VLE. Without formal piloting of the items, one can expect a large amount of…
Descriptors: Virtual Classrooms, Artificial Intelligence, Item Response Theory, Item Analysis
Pentecost, Thomas C.; Raker, Jeffery R.; Murphy, Kristen L. – Practical Assessment, Research & Evaluation, 2023
Using multiple versions of an assessment has the potential to introduce item environment effects. These types of effects result in version dependent item characteristics (i.e., difficulty and discrimination). Methods to detect such effects and resulting implications are important for all levels of assessment where multiple forms of an assessment…
Descriptors: Item Response Theory, Test Items, Test Format, Science Tests
Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…
Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory
Spratto, Elisabeth M.; Leventhal, Brian C.; Bandalos, Deborah L. – Educational and Psychological Measurement, 2021
In this study, we examined the results and interpretations produced from two different IRTree models--one using paths consisting of only dichotomous decisions, and one using paths consisting of both dichotomous and polytomous decisions. We used data from two versions of an impulsivity measure. In the first version, all the response options had…
Descriptors: Comparative Analysis, Item Response Theory, Decision Making, Data Analysis

Peer reviewed
Direct link
