Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 5 |
| Since 2007 (last 20 years) | 9 |
Descriptor
| Models | 20 |
| Validity | 12 |
| Item Response Theory | 7 |
| Psychometrics | 7 |
| Test Validity | 6 |
| Comparative Analysis | 4 |
| Evaluation Methods | 4 |
| Simulation | 4 |
| Test Bias | 4 |
| Achievement Tests | 3 |
| Bias | 3 |
| More ▼ | |
Source
| Journal of Educational… | 20 |
Author
| Roussos, Louis A. | 2 |
| de la Torre, Jimmy | 2 |
| Airasian, Peter W. | 1 |
| Bart, William M. | 1 |
| Bejar, Isaac I. | 1 |
| Carl Westine | 1 |
| Carstensen, Claus H. | 1 |
| Chen, Jinsong | 1 |
| Choi, In-Hee | 1 |
| Cronbach, Lee J. | 1 |
| DeMars, Christine E. | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 16 |
| Reports - Research | 10 |
| Reports - Evaluative | 3 |
| Reports - Descriptive | 2 |
| Information Analyses | 1 |
| Speeches/Meeting Papers | 1 |
Education Level
| Middle Schools | 2 |
| Junior High Schools | 1 |
| Secondary Education | 1 |
Audience
| Researchers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Sooyong Lee; Suhwa Han; Seung W. Choi – Journal of Educational Measurement, 2024
Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and…
Descriptors: Factor Analysis, Bayesian Statistics, Test Bias, Item Response Theory
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Langenfeld, Thomas; Thomas, Jay; Zhu, Rongchun; Morris, Carrie A. – Journal of Educational Measurement, 2020
An assessment of graphic literacy was developed by articulating and subsequently validating a skills-based cognitive model intended to substantiate the plausibility of score interpretations. Model validation involved use of multiple sources of evidence derived from large-scale field testing and cognitive labs studies. Data from large-scale field…
Descriptors: Evidence, Scores, Eye Movements, Psychometrics
Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H. – Journal of Educational Measurement, 2017
Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical…
Descriptors: Test Items, Cognitive Measurement, Testing Problems, Regression (Statistics)
Shin, Hyo Jeong; Wilson, Mark; Choi, In-Hee – Journal of Educational Measurement, 2017
This study proposes a structured constructs model (SCM) to examine measurement in the context of a multidimensional learning progression (LP). The LP is assumed to have features that go beyond a typical multidimentional IRT model, in that there are hypothesized to be certain cross-dimensional linkages that correspond to requirements between the…
Descriptors: Middle School Students, Student Evaluation, Measurement Techniques, Learning Processes
Chen, Jinsong; de la Torre, Jimmy; Zhang, Zao – Journal of Educational Measurement, 2013
As with any psychometric models, the validity of inferences from cognitive diagnosis models (CDMs) determines the extent to which these models can be useful. For inferences from CDMs to be valid, it is crucial that the fit of the model to the data is ascertained. Based on a simulation study, this study investigated the sensitivity of various fit…
Descriptors: Models, Psychometrics, Goodness of Fit, Statistical Analysis
Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…
Descriptors: Test Length, Test Items, Alignment (Education), Models
de la Torre, Jimmy – Journal of Educational Measurement, 2008
Most model fit analyses in cognitive diagnosis assume that a Q matrix is correct after it has been constructed, without verifying its appropriateness. Consequently, any model misfit attributable to the Q matrix cannot be addressed and remedied. To address this concern, this paper proposes an empirically based method of validating a Q matrix used…
Descriptors: Matrices, Validity, Models, Evaluation Methods
Roussos, Louis A.; Templin, Jonathan L.; Henson, Robert A. – Journal of Educational Measurement, 2007
This article describes a latent trait approach to skills diagnosis based on a particular variety of latent class models that employ item response functions (IRFs) as in typical item response theory (IRT) models. To enable and encourage comparisons with other approaches, this description is provided in terms of the main components of any…
Descriptors: Validity, Identification, Psychometrics, Item Response Theory
Peer reviewedCronbach, Lee J. – Journal of Educational Measurement, 1976
The Petersen-Novick paper dealing with culture fair selection (TM 502 259) is the basis for this article. The author proposes a perspective in which ideas can be lined up for comparison and suggests solutions to the problems of selection in employment. (DEP)
Descriptors: Bias, Employment Opportunities, Matrices, Models
Peer reviewedAirasian, Peter W.; Bart, William M. – Journal of Educational Measurement, 1975
Validation studies of learning hierarchies usually examine whether task relationships posited a priori are confirmed by student learning data. This method was compared with a non-posited task relationship where all possible task relationships were generated and investigated. A learning hierarchy in a seventh grade mathematics study reported by…
Descriptors: Difficulty Level, Intellectual Development, Junior High Schools, Learning Theories
Peer reviewedWilliamson, David M.; Bejar, Isaac I.; Hone, Anne S. – Journal of Educational Measurement, 1999
Contrasts "mental models" used by automated scoring for the simulation division of the computerized Architect Registration Examination with those used by experienced human graders for 3,613 candidate solutions. Discusses differences in the models used and the potential of automated scoring to enhance the validity evidence of scores. (SLD)
Descriptors: Architects, Comparative Analysis, Computer Assisted Testing, Judges
Peer reviewedLinn, Robert L. – Journal of Educational Measurement, 1984
The common approach to studies of predictive bias is analyzed within the context of a conceptual model in which predictors and criterion measures are viewed as fallible indicators of idealized qualifications. (Author/PN)
Descriptors: Certification, Models, Predictive Measurement, Predictive Validity
Peer reviewedWardrop, James L.; And Others – Journal of Educational Measurement, 1982
A structure for describing different approaches to testing is generated by identifying five dimensions along which tests differ: test uses, item generation, item revision, assessment of precision, and validation. These dimensions are used to profile tests of reading comprehension. Only norm-referenced achievement tests had an inference system…
Descriptors: Achievement Tests, Comparative Analysis, Educational Testing, Models
Peer reviewedHanna, Gila – Journal of Educational Measurement, 1984
The validity of a comparison of mean test scores for two groups and of a longitudinal comparison of means within each group is assessed. Using LISREL, factor analyses are used to test the hypotheses of similar factor patterns, equal units of measurement, and equal measurement accuracy between groups and across time. (Author/DWH)
Descriptors: Achievement Tests, Comparative Analysis, Data Analysis, Factor Analysis
Previous Page | Next Page »
Pages: 1 | 2
Direct link
