ERIC - Search Results

Publication Date

In 2025	1
Since 2024	8
Since 2021 (last 5 years)	17
Since 2016 (last 10 years)	35
Since 2006 (last 20 years)	82

Descriptor

Test Bias	125
Test Items	82
Item Response Theory	43
Item Bias	39
Models	36
Comparative Analysis	32
Statistical Bias	32
Bias	29
Scores	29
Simulation	28
Higher Education	24
Evaluation Methods	22
Item Analysis	22
Test Validity	22
Statistical Analysis	21
Test Construction	20
Error of Measurement	19
Mathematical Models	19
Sample Size	18
Equated Scores	17
Test Reliability	17
Difficulty Level	16
College Entrance Examinations	15
Mathematics Tests	15
Regression (Statistics)	15
More ▼

Source

Journal of Educational…

215

Publication Type

Journal Articles	185
Reports - Research	110
Reports - Evaluative	57
Reports - Descriptive	14
Speeches/Meeting Papers	12
Information Analyses	3
Book/Product Reviews	2
Guides - Non-Classroom	1
Opinion Papers	1
Reports - General	1

Education Level

Secondary Education	6
Elementary Secondary Education	3
Higher Education	2
Postsecondary Education	2
Grade 10	1
Grade 4	1
Grade 8	1
Grade 9	1
High Schools	1
Middle Schools	1

Audience

Researchers

Location

Israel	2
Belgium	1
Ireland	1
Netherlands	1
South Carolina	1
Turkey	1
United Kingdom (England)	1

Laws, Policies, & Programs

Defunis v Odegaard	1
Elementary and Secondary…	1
No Child Left Behind Act 2001	1

What Works Clearinghouse Rating

Journal of Educational Measurement X

Showing 1 to 15 of 215 results Save | Export

Argument-Based Approach to Validity: Developing a Living Document and Incorporating Preregistration

Peer reviewed

Direct link

Daria Gerasimova – Journal of Educational Measurement, 2024

I propose two practical advances to the argument-based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up-to-date summary of the validity argument. As the validation process may span across multiple studies, the living document…

Descriptors: Validity, Documentation, Methods, Research Reports

A Note on Latent Traits Estimates under IRT Models with Missingness

Peer reviewed

Direct link

Guo, Jinxin; Xu, Xin; Xin, Tao – Journal of Educational Measurement, 2023

Missingness due to not-reached items and omitted items has received much attention in the recent psychometric literature. Such missingness, if not handled properly, would lead to biased parameter estimation, as well as inaccurate inference of examinees, and further erode the validity of the test. This paper reviews some commonly used IRT based…

Descriptors: Psychometrics, Bias, Error of Measurement, Test Validity

A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption

Peer reviewed

Direct link

Sooyong Lee; Suhwa Han; Seung W. Choi – Journal of Educational Measurement, 2024

Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and…

Descriptors: Factor Analysis, Bayesian Statistics, Test Bias, Item Response Theory

Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework

Peer reviewed

Direct link

Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024

This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…

Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction

Psychometric Methods to Evaluate Measurement and Algorithmic Bias in Automated Scoring

Peer reviewed

Direct link

Johnson, Matthew S.; Liu, Xiang; McCaffrey, Daniel F. – Journal of Educational Measurement, 2022

With the increasing use of automated scores in operational testing settings comes the need to understand the ways in which they can yield biased and unfair results. In this paper, we provide a brief survey of some of the ways in which the predictive methods used in automated scoring can lead to biased, and thus unfair automated scores. After…

Descriptors: Psychometrics, Measurement Techniques, Bias, Automation

A Unified Comparison of IRT-Based Effect Sizes for DIF Investigations

Peer reviewed

Direct link

Chalmers, R. Philip – Journal of Educational Measurement, 2023

Several marginal effect size (ES) statistics suitable for quantifying the magnitude of differential item functioning (DIF) have been proposed in the area of item response theory; for instance, the Differential Functioning of Items and Tests (DFIT) statistics, signed and unsigned item difference in the sample statistics (SIDS, UIDS, NSIDS, and…

Descriptors: Test Bias, Item Response Theory, Definitions, Monte Carlo Methods

A Latent Class Signal Detection Model for Rater Scoring with Ordered Perceptual Distributions

Peer reviewed

Direct link

DeCarlo, Lawrence T.; Zhou, Xiaoliang – Journal of Educational Measurement, 2021

In signal detection rater models for constructed response (CR) scoring, it is assumed that raters discriminate equally well between different latent classes defined by the scoring rubric. An extended model that relaxes this assumption is introduced; the model recognizes that a rater may not discriminate equally well between some of the scoring…

Descriptors: Scoring, Models, Bias, Perception

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

A Nonparametric Composite Group DIF Index for Focal Groups Stemming from Multicategorical Variables

Peer reviewed

Direct link

Corinne Huggins-Manley; Anthony W. Raborn; Peggy K. Jones; Ted Myers – Journal of Educational Measurement, 2024

The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the…

Descriptors: Nonparametric Statistics, Test Bias, Scores, Statistical Significance

On the Positive Correlation between DIF and Difficulty: A New Theory on the Correlation as Methodological Artifact

Peer reviewed

Direct link

Bolt, Daniel M.; Liao, Xiangyi – Journal of Educational Measurement, 2021

We revisit the empirically observed positive correlation between DIF and difficulty studied by Freedle and commonly seen in tests of verbal proficiency when comparing populations of different mean latent proficiency levels. It is shown that a positive correlation between DIF and difficulty estimates is actually an expected result (absent any true…

Descriptors: Test Bias, Difficulty Level, Correlation, Verbal Tests

Computation and Accuracy Evaluation of Comparable Scores on Culturally Responsive Assessments

Peer reviewed

Direct link

Sandip Sinharay; Matthew S. Johnson – Journal of Educational Measurement, 2024

Culturally responsive assessments have been proposed as potential tools to ensure equity and fairness for examinees from all backgrounds including those from traditionally underserved or minoritized groups. However, these assessments are relatively new and, with few exceptions, are yet to be implemented in large scale. Consequently, there is a…

Descriptors: Culturally Relevant Education, Evaluation, Equal Education, Disadvantaged

Assessing Differential Bundle Functioning Using Meta-Analysis

Peer reviewed

Direct link

Lanrong Li; Betsy Jane Becker – Journal of Educational Measurement, 2021

Differential bundle functioning (DBF) has been proposed to quantify the accumulated amount of differential item functioning (DIF) in an item cluster/bundle (Douglas, Roussos, and Stout). The simultaneous item bias test (SIBTEST, Shealy and Stout) has been used to test for DBF (e.g., Walker, Zhang, and Surber). Research on DBF may have the…

Descriptors: Test Bias, Test Items, Meta Analysis, Effect Size

A Comparison of Aggregation Rules for Selecting Anchor Items in Multigroup DIF Analysis

Peer reviewed

Direct link

Huelmann, Thorben; Debelak, Rudolf; Strobl, Carolin – Journal of Educational Measurement, 2020

This study addresses the topic of how anchoring methods for differential item functioning (DIF) analysis can be used in multigroup scenarios. The direct approach would be to combine anchoring methods developed for two-group scenarios with multigroup DIF-detection methods. Alternatively, multiple tests could be carried out. The results of these…

Descriptors: Test Items, Test Bias, Equated Scores, Item Analysis

DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models

Peer reviewed

Direct link

Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…

Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory

Using Keystroke Behavior Patterns to Detect Nonauthentic Texts in Writing Assessments: Evaluating the Fairness of Predictive Models

Peer reviewed

Direct link

Yang Jiang; Mo Zhang; Jiangang Hao; Paul Deane; Chen Li – Journal of Educational Measurement, 2024

The emergence of sophisticated AI tools such as ChatGPT, coupled with the transition to remote delivery of educational assessments in the COVID-19 era, has led to increasing concerns about academic integrity and test security. Using AI tools, test takers can produce high-quality texts effortlessly and use them to game assessments. It is thus…

Descriptors: Integrity, Artificial Intelligence, Technology Uses in Education, Ethics

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 15

Linn, Robert L.	6
Bolt, Daniel M.	4
Dorans, Neil J.	4
Miller, Timothy R.	4
Novick, Melvin R.	4
Penfield, Randall D.	4
Zwick, Rebecca	4
Ankenmann, Robert D.	3
Camilli, Gregory	3
Cohen, Allan S.	3
DeCarlo, Lawrence T.	3
Goldman, Roy D.	3
Kim, Seock-Ho	3
Kim, Sooyeon	3
Puhan, Gautam	3
Roussos, Louis A.	3
Subkoviak, Michael J.	3
Wainer, Howard	3
Allen, Nancy L.	2
Chase, Clinton I.	2
Clauser, Brian E.	2
Cole, Nancy S.	2
Darlington, Richard B.	2
Finch, W. Holmes	2
More ▼

SAT (College Admission Test)	13
Graduate Record Examinations	5
Program for International…	5
National Assessment of…	4
California Achievement Tests	2
Metropolitan Achievement Tests	2
Metropolitan Readiness Tests	2
ACT Interest Inventory	1
Advanced Placement…	1
Armed Services Vocational…	1
Cattell Culture Fair…	1
Comprehensive Tests of Basic…	1
General Aptitude Test Battery	1
Iowa Tests of Basic Skills	1
Kaufman Assessment Battery…	1
Law School Admission Test	1
Mathematics Anxiety Rating…	1
McCarthy Scales of Childrens…	1
National Teacher Examinations	1
Peabody Picture Vocabulary…	1
Preschool Inventory	1
Slosson Intelligence Test	1
Stanford Achievement Tests	1
Stanford Binet Intelligence…	1
State Trait Anxiety Inventory	1
More ▼