Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 36 |
Descriptor
Computation | 39 |
Reliability | 39 |
Models | 10 |
Error of Measurement | 9 |
Statistical Analysis | 8 |
Correlation | 7 |
Item Response Theory | 7 |
Measures (Individuals) | 7 |
Classification | 6 |
Scores | 6 |
Intervals | 5 |
More ▼ |
Source
Author
Publication Type
Reports - Descriptive | 39 |
Journal Articles | 38 |
Opinion Papers | 1 |
Education Level
Elementary Secondary Education | 3 |
Elementary Education | 2 |
Higher Education | 2 |
Postsecondary Education | 2 |
Grade 5 | 1 |
Audience
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Eysenck Personality Inventory | 1 |
What Works Clearinghouse Rating
Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021
Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…
Descriptors: Decision Making, Reliability, Classification, Scores
Ferrando, Pere J.; Lorenzo-Seva, Urbano – Educational and Psychological Measurement, 2019
Measures initially designed to be single-trait often yield data that are compatible with both an essentially unidimensional factor-analysis (FA) solution and a correlated-factors solution. For these cases, this article proposes an approach aimed at providing information for deciding which of the two solutions is the most appropriate and useful.…
Descriptors: Factor Analysis, Computation, Reliability, Goodness of Fit
Diao, Hongyu; Sireci, Stephen G. – Journal of Applied Testing Technology, 2018
Whenever classification decisions are made on educational tests, such as pass/fail, or basic, proficient, or advanced, the consistency and accuracy of those decisions should be estimated and reported. Methods for estimating the reliability of classification decisions made on the basis of educational tests are well-established (e.g., Rudner, 2001;…
Descriptors: Classification, Item Response Theory, Accuracy, Reliability
Komperda, Regis; Pentecost, Thomas C.; Barbera, Jack – Journal of Chemical Education, 2018
This methodological paper examines current conceptions of reliability in chemistry education research (CER) and provides recommendations for moving beyond the current reliance on reporting coefficient alpha (a) as reliability evidence without regard to its appropriateness for the research context. To help foster a better understanding of…
Descriptors: Chemistry, Science Instruction, Teaching Methods, Reliability
Lathrop, Quinn N. – Practical Assessment, Research & Evaluation, 2015
There are two main lines of research in estimating classification accuracy (CA) and classification consistency (CC) under Item Response Theory (IRT). The R package cacIRT provides computer implementations of both approaches in an accessible and unified framework. Even with available implementations, there remains decisions a researcher faces when…
Descriptors: Classification, Accuracy, Item Response Theory, Reliability
Raykov, Tenko; Marcoulides, George A. – Educational and Psychological Measurement, 2015
A direct approach to point and interval estimation of Cronbach's coefficient alpha for multiple component measuring instruments is outlined. The procedure is based on a latent variable modeling application with widely circulated software. As a by-product, using sample data the method permits ascertaining whether the population discrepancy…
Descriptors: Computation, Statistical Analysis, Reliability, Models
Leckie, George – Journal of Educational and Behavioral Statistics, 2018
The traditional approach to estimating the consistency of school effects across subject areas and the stability of school effects across time is to fit separate value-added multilevel models to each subject or cohort and to correlate the resulting empirical Bayes predictions. We show that this gives biased correlations and these biases cannot be…
Descriptors: Value Added Models, Reliability, Statistical Bias, Computation
Gorard, Stephen; Gorard, Jonathan – International Journal of Social Research Methodology, 2016
This brief paper introduces a new approach to assessing the trustworthiness of research comparisons when expressed numerically. The 'number needed to disturb' a research finding would be the number of counterfactual values that can be added to the smallest arm of any comparison before the difference or 'effect' size disappears, minus the number of…
Descriptors: Statistical Significance, Testing, Sampling, Attrition (Research Studies)
Capuano, Nicola; Loia, Vincenzo; Orciuoli, Francesco – IEEE Transactions on Learning Technologies, 2017
Massive Open Online Courses (MOOCs) are becoming an increasingly popular choice for education but, to reach their full extent, they require the resolution of new issues like assessing students at scale. A feasible approach to tackle this problem is peer assessment, in which students also play the role of assessor for assignments submitted by…
Descriptors: Participative Decision Making, Models, Peer Evaluation, Online Courses
Beaujean, A. Alexander – Practical Assessment, Research & Evaluation, 2014
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
Descriptors: Regression (Statistics), Sample Size, Sampling, Monte Carlo Methods
Caudle, Kyle A.; Ruth, David M. – Journal of Computers in Mathematics and Science Teaching, 2013
Teaching undergraduates the basic properties of an estimator can be difficult. Most definitions are easy enough to comprehend, but difficulties often lie in gaining a "good feel" for these properties and why one property might be more desired as compared to another property. Simulations which involve visualization of these properties can…
Descriptors: Computation, Statistics, College Mathematics, Mathematics Instruction
Culpepper, Steven Andrew – Applied Psychological Measurement, 2013
A classic topic in the fields of psychometrics and measurement has been the impact of the number of scale categories on test score reliability. This study builds on previous research by further articulating the relationship between item response theory (IRT) and classical test theory (CTT). Equations are presented for comparing the reliability and…
Descriptors: Item Response Theory, Reliability, Scores, Error of Measurement
Gadermann, Anne M.; Guhn, Martin; Zumbo, Bruno D. – Practical Assessment, Research & Evaluation, 2012
This paper provides a conceptual, empirical, and practical guide for estimating ordinal reliability coefficients for ordinal item response data (also referred to as Likert, Likert-type, ordered categorical, or rating scale item responses). Conventionally, reliability coefficients, such as Cronbach's alpha, are calculated using a Pearson…
Descriptors: Likert Scales, Rating Scales, Reliability, Computation
Raykov, Tenko; Marcoulides, George A. – Structural Equation Modeling: A Multidisciplinary Journal, 2012
A latent variable modeling method is outlined, which accomplishes estimation of criterion validity and reliability for a multicomponent measuring instrument with hierarchical structure. The approach provides point and interval estimates for the scale criterion validity and reliability coefficients, and can also be used for testing composite or…
Descriptors: Predictive Validity, Reliability, Structural Equation Models, Measures (Individuals)
Crawford, John R.; Garthwaite, Paul H.; Morrice, Nicola; Duff, Kevin – Psychological Assessment, 2012
Supplementary methods for the analysis of the Repeatable Battery for the Assessment of Neuropsychological Status are made available, including (a) quantifying the number of abnormally low Index scores and abnormally large differences exhibited by a case and accompanying this with estimates of the percentages of the normative population expected to…
Descriptors: Neurological Impairments, Cognitive Tests, Psychological Testing, Adults