Publication Date
In 2025 | 2 |
Since 2024 | 7 |
Since 2021 (last 5 years) | 35 |
Since 2016 (last 10 years) | 79 |
Since 2006 (last 20 years) | 174 |
Descriptor
Test Bias | 154 |
Test Items | 99 |
Statistical Bias | 93 |
Item Response Theory | 81 |
Statistical Analysis | 77 |
Simulation | 63 |
Correlation | 57 |
Error of Measurement | 55 |
Sample Size | 50 |
Comparative Analysis | 47 |
Monte Carlo Methods | 46 |
More ▼ |
Source
Educational and Psychological… | 296 |
Author
Finch, W. Holmes | 8 |
Wang, Wen-Chung | 6 |
Zumbo, Bruno D. | 5 |
Beretvas, S. Natasha | 4 |
French, Brian F. | 4 |
Kromrey, Jeffrey D. | 4 |
Oshima, T. C. | 4 |
Strobl, Carolin | 4 |
Walker, Cindy M. | 4 |
Ahn, Soyeon | 3 |
DeMars, Christine E. | 3 |
More ▼ |
Publication Type
Journal Articles | 269 |
Reports - Research | 208 |
Reports - Evaluative | 51 |
Reports - Descriptive | 8 |
Speeches/Meeting Papers | 5 |
Guides - Non-Classroom | 1 |
Opinion Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 13 |
Postsecondary Education | 12 |
Secondary Education | 11 |
Elementary Education | 9 |
Middle Schools | 9 |
Junior High Schools | 6 |
Grade 3 | 4 |
Early Childhood Education | 3 |
Grade 4 | 3 |
Grade 7 | 3 |
Intermediate Grades | 3 |
More ▼ |
Audience
Location
Germany | 5 |
Canada | 4 |
Georgia | 3 |
Australia | 2 |
California | 2 |
China | 2 |
Spain | 2 |
Taiwan | 2 |
United States | 2 |
Alaska | 1 |
Brazil | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2024
Rapid guessing (RG) is a form of non-effortful responding that is characterized by short response latencies. This construct-irrelevant behavior has been shown in previous research to bias inferences concerning measurement properties and scores. To mitigate these deleterious effects, a number of response time threshold scoring procedures have been…
Descriptors: Reaction Time, Scores, Item Response Theory, Guessing (Tests)
Weese, James D.; Turner, Ronna C.; Ames, Allison; Crawford, Brandon; Liang, Xinya – Educational and Psychological Measurement, 2022
A simulation study was conducted to investigate the heuristics of the SIBTEST procedure and how it compares with ETS classification guidelines used with the Mantel-Haenszel procedure. Prior heuristics have been used for nearly 25 years, but they are based on a simulation study that was restricted due to computer limitations and that modeled item…
Descriptors: Test Bias, Heuristics, Classification, Statistical Analysis
Dimitrov, Dimiter M.; Atanasov, Dimitar V. – Educational and Psychological Measurement, 2022
This study offers an approach to testing for differential item functioning (DIF) in a recently developed measurement framework, referred to as "D"-scoring method (DSM). Under the proposed approach, called "P-Z" method of testing for DIF, the item response functions of two groups (reference and focal) are compared by…
Descriptors: Test Bias, Methods, Test Items, Scoring
Nazari, Sanaz; Leite, Walter L.; Huggins-Manley, A. Corinne – Educational and Psychological Measurement, 2023
Social desirability bias (SDB) has been a major concern in educational and psychological assessments when measuring latent variables because it has the potential to introduce measurement error and bias in assessments. Person-fit indices can detect bias in the form of misfitted response vectors. The objective of this study was to compare the…
Descriptors: Social Desirability, Bias, Indexes, Goodness of Fit
Suppanut Sriutaisuk; Yu Liu; Seungwon Chung; Hanjoe Kim; Fei Gu – Educational and Psychological Measurement, 2025
The multiple imputation two-stage (MI2S) approach holds promise for evaluating the model fit of structural equation models for ordinal variables with multiply imputed data. However, previous studies only examined the performance of MI2S-based residual-based test statistics. This study extends previous research by examining the performance of two…
Descriptors: Structural Equation Models, Error of Measurement, Programming Languages, Goodness of Fit
André Beauducel; Norbert Hilger; Tobias Kuhl – Educational and Psychological Measurement, 2024
Regression factor score predictors have the maximum factor score determinacy, that is, the maximum correlation with the corresponding factor, but they do not have the same inter-correlations as the factors. As it might be useful to compute factor score predictors that have the same inter-correlations as the factors, correlation-preserving factor…
Descriptors: Scores, Factor Analysis, Correlation, Predictor Variables
Sanaz Nazari; Walter L. Leite; A. Corinne Huggins-Manley – Educational and Psychological Measurement, 2024
Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish…
Descriptors: Social Desirability, Bias, Artificial Intelligence, Identification
Finch, W. Holmes – Educational and Psychological Measurement, 2023
Psychometricians have devoted much research and attention to categorical item responses, leading to the development and widespread use of item response theory for the estimation of model parameters and identification of items that do not perform in the same way for examinees from different population subgroups (e.g., differential item functioning…
Descriptors: Test Bias, Item Response Theory, Computation, Methods
Martijn Schoenmakers; Jesper Tijmstra; Jeroen Vermunt; Maria Bolsinova – Educational and Psychological Measurement, 2024
Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these…
Descriptors: Item Response Theory, Response Style (Tests), Models, Likert Scales
Man, Kaiwen; Schumacker, Randall; Morell, Monica; Wang, Yurou – Educational and Psychological Measurement, 2022
While hierarchical linear modeling is often used in social science research, the assumption of normally distributed residuals at the individual and cluster levels can be violated in empirical data. Previous studies have focused on the effects of nonnormality at either lower or higher level(s) separately. However, the violation of the normality…
Descriptors: Hierarchical Linear Modeling, Statistical Distributions, Statistical Bias, Computation
Hung-Yu Huang – Educational and Psychological Measurement, 2025
The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent…
Descriptors: Response Style (Tests), Psychological Characteristics, Item Response Theory, Test Reliability
Weese, James D.; Turner, Ronna C.; Liang, Xinya; Ames, Allison; Crawford, Brandon – Educational and Psychological Measurement, 2023
A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and…
Descriptors: Effect Size, Classification, Guidelines, Statistical Analysis
Mostafa Hosseinzadeh; Ki Lynn Matlock Cole – Educational and Psychological Measurement, 2024
In real-world situations, multidimensional data may appear on large-scale tests or psychological surveys. The purpose of this study was to investigate the effects of the quantity and magnitude of cross-loadings and model specification on item parameter recovery in multidimensional Item Response Theory (MIRT) models, especially when the model was…
Descriptors: Item Response Theory, Models, Maximum Likelihood Statistics, Algorithms
Man, Kaiwen; Harring, Jeffrey R. – Educational and Psychological Measurement, 2023
Preknowledge cheating jeopardizes the validity of inferences based on test results. Many methods have been developed to detect preknowledge cheating by jointly analyzing item responses and response times. Gaze fixations, an essential eye-tracker measure, can be utilized to help detect aberrant testing behavior with improved accuracy beyond using…
Descriptors: Cheating, Reaction Time, Test Items, Responses
Kush, Joseph M.; Konold, Timothy R.; Bradshaw, Catherine P. – Educational and Psychological Measurement, 2022
Multilevel structural equation modeling (MSEM) allows researchers to model latent factor structures at multiple levels simultaneously by decomposing within- and between-group variation. Yet the extent to which the sampling ratio (i.e., proportion of cases sampled from each group) influences the results of MSEM models remains unknown. This article…
Descriptors: Structural Equation Models, Factor Structure, Statistical Bias, Error of Measurement