Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 5 |
Descriptor
| Hierarchical Linear Modeling | 5 |
| Item Response Theory | 3 |
| Evaluation Methods | 2 |
| Models | 2 |
| Test Bias | 2 |
| Test Items | 2 |
| Achievement Tests | 1 |
| Bias | 1 |
| Computer Assisted Testing | 1 |
| Correlation | 1 |
| Difficulty Level | 1 |
| More ▼ | |
Source
| Journal of Educational… | 5 |
Author
| Albano, Anthony D. | 1 |
| Artur Pokropek | 1 |
| Bunch, Michael B. | 1 |
| Cai, Liuhan | 1 |
| Carl Westine | 1 |
| Carmen Köhler | 1 |
| Johannes Hartig | 1 |
| Lale Khorramdel | 1 |
| Lease, Erin M. | 1 |
| McConnell, Scott R. | 1 |
| Michelle Boyer | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 5 |
| Reports - Research | 5 |
Education Level
| Elementary Education | 1 |
| Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
| Program for International… | 1 |
What Works Clearinghouse Rating
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024
For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…
Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory
Shear, Benjamin R. – Journal of Educational Measurement, 2018
When contextual features of test-taking environments differentially affect item responding for different test takers and these features vary across test administrations, they may cause differential item functioning (DIF) that varies across test administrations. Because many common DIF detection methods ignore potential DIF variance, this article…
Descriptors: Test Bias, Regression (Statistics), Hierarchical Linear Modeling
Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019
Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…
Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation
Albano, Anthony D.; Cai, Liuhan; Lease, Erin M.; McConnell, Scott R. – Journal of Educational Measurement, 2019
Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in…
Descriptors: Test Items, Computer Assisted Testing, Item Analysis, Difficulty Level

Peer reviewed
Direct link
