NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Type
Reports - Research21
Journal Articles18
Speeches/Meeting Papers3
Audience
Location
Michigan1
Laws, Policies, & Programs
Assessments and Surveys
Iowa Tests of Basic Skills1
What Works Clearinghouse Rating
Showing 1 to 15 of 21 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Edwards, Ashley A.; Joyner, Keanan J.; Schatschneider, Christopher – Educational and Psychological Measurement, 2021
The accuracy of certain internal consistency estimators have been questioned in recent years. The present study tests the accuracy of six reliability estimators (Cronbach's alpha, omega, omega hierarchical, Revelle's omega, and greatest lower bound) in 140 simulated conditions of unidimensional continuous data with uncorrelated errors with varying…
Descriptors: Reliability, Computation, Accuracy, Sample Size
Peer reviewed Peer reviewed
Direct linkDirect link
Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023
A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…
Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Bao, Yu; Bradshaw, Laine – Measurement: Interdisciplinary Research and Perspectives, 2018
Diagnostic classification models (DCMs) can provide multidimensional diagnostic feedback about students' mastery levels of knowledge components or attributes. One advantage of using DCMs is the ability to accurately and reliably classify students into mastery levels with a relatively small number of items per attribute. Combining DCMs with…
Descriptors: Test Items, Selection, Adaptive Testing, Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Ren; Huggins-Manley, Anne Corinne; Bradshaw, Laine – Educational and Psychological Measurement, 2017
There is an increasing demand for assessments that can provide more fine-grained information about examinees. In response to the demand, diagnostic measurement provides students with feedback on their strengths and weaknesses on specific skills by classifying them into mastery or nonmastery attribute categories. These attributes often form a…
Descriptors: Matrices, Classification, Accuracy, Diagnostic Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Jihyun; Paek, Insu – Journal of Psychoeducational Assessment, 2014
Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling…
Descriptors: Likert Scales, Responses, Item Response Theory, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Chun – Journal of Educational and Behavioral Statistics, 2014
Many latent traits in social sciences display a hierarchical structure, such as intelligence, cognitive ability, or personality. Usually a second-order factor is linearly related to a group of first-order factors (also called domain abilities in cognitive ability measures), and the first-order factors directly govern the actual item responses.…
Descriptors: Measurement, Accuracy, Item Response Theory, Adaptive Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012
Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…
Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E.; Hao, Shiqi – Applied Psychological Measurement, 2012
This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…
Descriptors: Item Response Theory, Classification, Accuracy, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Kolen, Michael J.; Tong, Ye – Educational Measurement: Issues and Practice, 2010
Psychometric properties of item response theory proficiency estimates are considered in this paper. Proficiency estimators based on summed scores and pattern scores include non-Bayes maximum likelihood and test characteristic curve estimators and Bayesian estimators. The psychometric properties investigated include reliability, conditional…
Descriptors: Test Length, Psychometrics, Item Response Theory, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Blasingame, Gerry D.; Abel, Gene G.; Jordan, Alan; Wiegel, Markus – Journal of Mental Health Research in Intellectual Disabilities, 2011
This article describes the development and utility of the Abel-Blasingame Assessment System for "individuals with intellectual disabilities" (ABID) for assessment of sexual interest and problematic sexual behaviors. The study examined the preliminary psychometric properties and evaluated the clinical utility of the ABID based on a sample…
Descriptors: Mental Retardation, Developmental Delays, Measures (Individuals), Questionnaires
Peer reviewed Peer reviewed
Direct linkDirect link
Glas, Cees A. W.; Pimentel, Jonald L. – Educational and Psychological Measurement, 2008
In tests with time limits, items at the end are often not reached. Usually, the pattern of missing responses depends on the ability level of the respondents; therefore, missing data are not ignorable in statistical inference. This study models data using a combination of two item response theory (IRT) models: one for the observed response data and…
Descriptors: Intelligence Tests, Statistical Inference, Item Response Theory, Modeling (Psychology)
Pommerich, Mary – Journal of Technology, Learning, and Assessment, 2007
Computer administered tests are becoming increasingly prevalent as computer technology becomes more readily available on a large scale. For testing programs that utilize both computer and paper administrations, mode effects are problematic in that they can result in examinee scores that are artificially inflated or deflated. As such, researchers…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Scores
Peer reviewed Peer reviewed
Feldt, Leonard S. – Applied Measurement in Education, 2002
Considers the degree of bias in testlet-based alpha (internal consistency reliability) through hypothetical examples and real test data from four tests of the Iowa Tests of Basic Skills. Presents a simple formula for computing a testlet-based congeneric coefficient. (SLD)
Descriptors: Estimation (Mathematics), Reliability, Statistical Bias, Test Format
Peer reviewed Peer reviewed
Stern, Paul C.; Guagnano, Gregory A.; Dietz, Thomas – Educational and Psychological Measurement, 1998
A brief version of the instrument developed by S. Schwartz (1992, 1994) to measure the structure and content of human values was developed. Studies with 199 adults and 420 adults support the reliability of scores produced by the brief inventory's four three-item scales. Uses of the brief form are discussed. (SLD)
Descriptors: Adults, Reliability, Scores, Test Construction
Previous Page | Next Page »
Pages: 1  |  2