Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 2 |
Descriptor
| Error of Measurement | 13 |
| Statistical Significance | 13 |
| Test Reliability | 13 |
| Mathematical Models | 6 |
| Statistical Analysis | 5 |
| Comparative Analysis | 4 |
| Hypothesis Testing | 4 |
| Achievement Gains | 3 |
| Analysis of Variance | 3 |
| Individual Differences | 3 |
| Power (Statistics) | 3 |
| More ▼ | |
Author
Publication Type
| Reports - Research | 7 |
| Journal Articles | 5 |
| Reports - Evaluative | 3 |
| Speeches/Meeting Papers | 3 |
| Collected Works - General | 1 |
| Numerical/Quantitative Data | 1 |
| Opinion Papers | 1 |
| Reference Materials -… | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
| Wechsler Adult Intelligence… | 1 |
What Works Clearinghouse Rating
Dandan Tang; Steven M. Boker; Xin Tong – Structural Equation Modeling: A Multidisciplinary Journal, 2025
The replication crisis in social and behavioral sciences has raised concerns about the reliability and validity of empirical studies. While research in the literature has explored contributing factors to this crisis, the issues related to analytical tools have received less attention. This study focuses on a widely used analytical tool -…
Descriptors: Test Validity, Factor Analysis, Replication (Evaluation), Social Science Research
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Peer reviewedFeldt, Leonard S. – Psychometrika, 1980
Procedures are developed for testing the hypothesis that Cronbach's alpha reliability coefficient is equal for two tests given to the same subjects. (Author/JKS)
Descriptors: Error of Measurement, Hypothesis Testing, Measurement, Statistical Significance
PDF pending restorationLam, Tony C. M. – 1981
The objective of this paper is to examine the relationship between the unreliability of difference scores and the power of tests of significance in an attempt to determine the validity of the paradox for the measurement of change presented by Overall and Woodward: that the power of tests of significance is maximum when the reliability of the…
Descriptors: Achievement Gains, Correlation, Error of Measurement, Hypothesis Testing
Peer reviewedZimmerman, Donald W.; And Others – Applied Psychological Measurement, 1993
Some of the methods originally used to find relationships between reliability and power associated with a single measurement are extended to difference scores. Results, based on explicit power calculations, show that augmenting the reliability of measurement by reducing error score variance can make significance tests of difference more powerful.…
Descriptors: Equations (Mathematics), Error of Measurement, Individual Differences, Mathematical Models
Peer reviewedHumphreys, Lloyd G.; And Others – Applied Psychological Measurement, 1993
Two articles discuss the controversy about the relationship between reliability and the power of significance tests in response to the discussion of Donald W. Zimmerman, Richard H. Williams, and Bruno D. Zumbo. Lloyd G. Humphreys emphasizes the differences between what statisticians can do and constraints on researchers. Zimmerman, Williams, and…
Descriptors: Error of Measurement, Individual Differences, Power (Statistics), Research Methodology
PDF pending restorationHarris, Chester W. – 1971
Livingston's work is a careful analysis of what occurs when one pools two populations with different means, but similar variances and reliability coefficients. However, his work fails to advance reliability theory for the special case of criterion-referenced testing. See ED 042 802 for Livingston's paper. (MS)
Descriptors: Analysis of Variance, Criterion Referenced Tests, Error of Measurement, Reliability
Harris, Chester W.; And Others – 1977
The implications of a mathematical model of test scores are explored where the data are limited to a random sample of items without replacement from an indefinitely large population or item domain in which items are scored either zero or one. The purpose is to obtain an unbiased estimate of a student's proportion of items correct in the item…
Descriptors: Academic Achievement, Achievement Tests, Annotated Bibliographies, Bibliographies
Dunivant, Noel – 1979
Eight different methods are reviewed for determining whether two or more tests are equivalent measures. These methods vary in restrictiveness from the Wilks-Votaw test of compound symmetry (which requires that all means, variances, and covariances are equal), to Joreskog's theory of congeneric tests (which requires only that the tests are measures…
Descriptors: Analysis of Variance, Comparative Analysis, Error of Measurement, Evaluation Methods
Atkinson, Leslie – 1990
Three tables are provided to aid in the clinical interpretation of factor scores for the Wechsler Adult Intelligence Scale-Revised (WAIS-R; 1981). The factor structure of the WAIS-R has proven to be robust across samples, tests, time, statistical analyses, measurement scales, and distortions of the distribution. Information necessary to make…
Descriptors: Adults, Clinical Diagnosis, Diagnostic Tests, Error of Measurement
Schumacker, Randall E. – 1992
The regression-discontinuity approach to evaluating educational programs is reviewed, and regression-discontinuity post-program mean differences under various conditions are discussed. The regression-discontinuity design is used to determine whether post-program differences exist between an experimental program and a control group. The difference…
Descriptors: Comparative Analysis, Computer Simulation, Control Groups, Cutting Scores
Marston, Paul T., Borich, Gary D. – 1977
The four main approaches to measuring treatment effects in schools; raw gain, residual gain, covariance, and true scores; were compared. A simulation study showed true score analysis produced a large number of Type-I errors. When corrected for this error, this method showed the least power of the four. This outcome was clearly the result of the…
Descriptors: Achievement Gains, Analysis of Covariance, Comparative Analysis, Error of Measurement
Olejnik, Stephen F.; Porter, Andrew C. – 1978
The statistical properties of two methods of estimating gain scores for groups in quasi-experiments are compared: (1) gains in scores standardized separately for each group; and (2) analysis of covariance with estimated true pretest scores. The fan spread hypothesis is assumed for groups but not necessarily assumed for members of the groups.…
Descriptors: Academic Achievement, Achievement Gains, Analysis of Covariance, Analysis of Variance

Direct link
