Publication Date
| In 2026 | 0 |
| Since 2025 | 11 |
| Since 2022 (last 5 years) | 40 |
| Since 2017 (last 10 years) | 83 |
| Since 2007 (last 20 years) | 269 |
Descriptor
| Evaluation Methods | 320 |
| Models | 320 |
| Item Response Theory | 148 |
| Feedback (Response) | 112 |
| Simulation | 56 |
| Foreign Countries | 55 |
| Comparative Analysis | 45 |
| Student Evaluation | 45 |
| Computation | 41 |
| Data Analysis | 39 |
| Psychometrics | 37 |
| More ▼ | |
Source
Author
| Cai, Li | 4 |
| Wilson, Mark | 4 |
| Bauer, Daniel J. | 3 |
| Falk, Carl F. | 3 |
| Maris, Gunter | 3 |
| Sijtsma, Klaas | 3 |
| Woods, Carol M. | 3 |
| Andrich, David | 2 |
| Barnes, Tiffany, Ed. | 2 |
| Chun Wang | 2 |
| Cohen, Allan S. | 2 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 7 |
| Administrators | 3 |
| Support Staff | 3 |
| Counselors | 2 |
| Media Staff | 2 |
| Teachers | 2 |
| Practitioners | 1 |
| Students | 1 |
Location
| Australia | 11 |
| United Kingdom (England) | 8 |
| China | 6 |
| Germany | 6 |
| United Kingdom | 6 |
| California | 5 |
| Florida | 5 |
| Italy | 5 |
| Rhode Island | 5 |
| North Carolina | 4 |
| Brazil | 3 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 2 |
| Education Consolidation… | 1 |
| Education for All Handicapped… | 1 |
| Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Christina Glasauer; Martin K. Yeh; Lois Anne DeLong; Yu Yan; Yanyan Zhuang – Computer Science Education, 2025
Background and Context: Feedback on one's progress is essential to new programming language learners, particularly in out-of-classroom settings. Though many study materials offer assessment mechanisms, most do not examine the accuracy of the feedback they deliver, nor give evidence on its validity. Objective: We investigate the potential use of a…
Descriptors: Novices, Computer Science Education, Programming, Accuracy
Markus T. Jansen; Ralf Schulze – Educational and Psychological Measurement, 2024
Thurstonian forced-choice modeling is considered to be a powerful new tool to estimate item and person parameters while simultaneously testing the model fit. This assessment approach is associated with the aim of reducing faking and other response tendencies that plague traditional self-report trait assessments. As a result of major recent…
Descriptors: Factor Analysis, Models, Item Analysis, Evaluation Methods
Paul A. Jewsbury; J. R. Lockwood; Matthew S. Johnson – Large-scale Assessments in Education, 2025
Many large-scale assessments model proficiency with a latent regression on contextual variables. Item-response data are used to estimate the parameters of the latent variable model and are used in conjunction with the contextual data to generate plausible values of individuals' proficiency attributes. These models typically incorporate numerous…
Descriptors: Item Response Theory, Data Use, Models, Evaluation Methods
Jean-Paul Fox – Journal of Educational and Behavioral Statistics, 2025
Popular item response theory (IRT) models are considered complex, mainly due to the inclusion of a random factor variable (latent variable). The random factor variable represents the incidental parameter problem since the number of parameters increases when including data of new persons. Therefore, IRT models require a specific estimation method…
Descriptors: Sample Size, Item Response Theory, Accuracy, Bayesian Statistics
Sohee Kim; Ki Lynn Cole – International Journal of Testing, 2025
This study conducted a comprehensive comparison of Item Response Theory (IRT) linking methods applied to a bifactor model, examining their performance on both multiple choice (MC) and mixed format tests within the common item nonequivalent group design framework. Four distinct multidimensional IRT linking approaches were explored, consisting of…
Descriptors: Item Response Theory, Comparative Analysis, Models, Item Analysis
Hung, Su-Pin; Huang, Hung-Yu – Journal of Educational and Behavioral Statistics, 2022
To address response style or bias in rating scales, forced-choice items are often used to request that respondents rank their attitudes or preferences among a limited set of options. The rating scales used by raters to render judgments on ratees' performance also contribute to rater bias or errors; consequently, forced-choice items have recently…
Descriptors: Evaluation Methods, Rating Scales, Item Analysis, Preferences
Anthony R. Reibel – Journal of School Administration Research and Development, 2025
Traditional assessment design focuses on outcomes and often disregards how students perceive their abilities, process emotions, or self-express. This indifference can undermine assessment outcomes and evaluation reliability (Hattie, 2023; Nilson, 2023; Reibel 2022). This paper introduces "empathetic assessment design" (EAD), a framework…
Descriptors: Empathy, Evaluation Methods, Student Evaluation, Models
Austin M. Shin; Ayaan M. Kazerouni – ACM Transactions on Computing Education, 2024
Background and Context: Students' programming projects are often assessed on the basis of their tests as well as their implementations, most commonly using test adequacy criteria like branch coverage, or, in some cases, mutation analysis. As a result, students are implicitly encouraged to use these tools during their development process (i.e., so…
Descriptors: Feedback (Response), Programming, Student Projects, Computer Software
Kim, Stella Y. – Educational Measurement: Issues and Practice, 2022
In this digital ITEMS module, Dr. Stella Kim provides an overview of multidimensional item response theory (MIRT) equating. Traditional unidimensional item response theory (IRT) equating methods impose the sometimes untenable restriction on data that only a single ability is assessed. This module discusses potential sources of multidimensionality…
Descriptors: Item Response Theory, Models, Equated Scores, Evaluation Methods
Jiaying Xiao; Chun Wang; Gongjun Xu – Grantee Submission, 2024
Accurate item parameters and standard errors (SEs) are crucial for many multidimensional item response theory (MIRT) applications. A recent study proposed the Gaussian Variational Expectation Maximization (GVEM) algorithm to improve computational efficiency and estimation accuracy (Cho et al., 2021). However, the SE estimation procedure has yet to…
Descriptors: Error of Measurement, Models, Evaluation Methods, Item Analysis
Hussain, Zawar; Cheema, Salman Arif; Hussain, Ishtiaq – Sociological Methods & Research, 2022
This article is about making correction in Tarray, Singh, and Zaizai model and further improving it when stratified random sampling is necessary. This is done by using optional randomized response technique in stratified sampling using a combination of Mangat and Singh, Mangat, and Greenberg et al. models. The suggested model has been studied…
Descriptors: Comparative Analysis, Models, Surveys, Questionnaires
Causes of Nonlinear Metrics in Item Response Theory Models and Implications for Educational Research
Xiangyi Liao – ProQuest LLC, 2024
Educational research outcomes frequently rely on an assumption that measurement metrics have interval-level properties. While most investigators know enough to be suspicious of interval-level claims, and in some cases even question their findings given such doubts, there is a lack of understanding regarding the measurement conditions that create…
Descriptors: Item Response Theory, Educational Research, Measurement, Evaluation Methods
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Madeline A. Schellman; Matthew J. Madison – Grantee Submission, 2024
Diagnostic classification models (DCMs) have grown in popularity as stakeholders increasingly desire actionable information related to students' skill competencies. Longitudinal DCMs offer a psychometric framework for providing estimates of students' proficiency status transitions over time. For both cross-sectional and longitudinal DCMs, it is…
Descriptors: Diagnostic Tests, Classification, Models, Psychometrics
Joakim Wallmark; James O. Ramsay; Juan Li; Marie Wiberg – Journal of Educational and Behavioral Statistics, 2024
Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker's attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of…
Descriptors: Item Response Theory, Test Items, Models, Scoring

Peer reviewed
Direct link
