ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	13

Descriptor

Reliability	21
Test Length	21
Scores	8
Test Items	8
Item Response Theory	6
Accuracy	5
Classification	5
Sample Size	5
Bayesian Statistics	4
Cutting Scores	4
Adaptive Testing	3
Comparative Analysis	3
Computer Assisted Testing	3
Mastery Tests	3
Probability	3
Test Construction	3
Test Format	3
Adults	2
Bias	2
Computation	2
Correlation	2
Criterion Referenced Tests	2
Decision Making	2
Elementary Education	2
Error of Measurement	2
More ▼

Source

Educational and Psychological…	4
Applied Measurement in…	3
Measurement:…	2
Applied Psychological…	1
Educational Measurement:…	1
International Journal of…	1
Journal of Educational…	1
Journal of Educational and…	1
Journal of Mental Health…	1
Journal of Psychoeducational…	1
Journal of Technology,…	1
Psychological Assessment	1
More ▼

Publication Type

Reports - Research	21
Journal Articles	18
Speeches/Meeting Papers	3

Education Level

High Schools	2
Secondary Education	2
Grade 11	1
Grade 12	1

Audience

Location

Michigan

Laws, Policies, & Programs

Assessments and Surveys

Iowa Tests of Basic Skills

What Works Clearinghouse Rating

Showing 1 to 15 of 21 results Save | Export

A Simulation Study on the Performance of Different Reliability Estimation Methods

Peer reviewed

Direct link

Edwards, Ashley A.; Joyner, Keanan J.; Schatschneider, Christopher – Educational and Psychological Measurement, 2021

The accuracy of certain internal consistency estimators have been questioned in recent years. The present study tests the accuracy of six reliability estimators (Cronbach's alpha, omega, omega hierarchical, Revelle's omega, and greatest lower bound) in 140 simulated conditions of unidimensional continuous data with uncorrelated errors with varying…

Descriptors: Reliability, Computation, Accuracy, Sample Size

There Are Many Greater Lower Bounds than Cronbach's [alpha]: A Monte Carlo Simulation Study

Peer reviewed

Direct link

Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023

A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…

Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation

Attribute-Level Item Selection Method for DCM-CAT

Peer reviewed

Direct link

Bao, Yu; Bradshaw, Laine – Measurement: Interdisciplinary Research and Perspectives, 2018

Diagnostic classification models (DCMs) can provide multidimensional diagnostic feedback about students' mastery levels of knowledge components or attributes. One advantage of using DCMs is the ability to accurately and reliably classify students into mastery levels with a relatively small number of items per attribute. Combining DCMs with…

Descriptors: Test Items, Selection, Adaptive Testing, Computer Assisted Testing

The Impact of Q-Matrix Designs on Diagnostic Classification Accuracy in the Presence of Attribute Hierarchies

Peer reviewed

Direct link

Liu, Ren; Huggins-Manley, Anne Corinne; Bradshaw, Laine – Educational and Psychological Measurement, 2017

There is an increasing demand for assessments that can provide more fine-grained information about examinees. In response to the demand, diagnostic measurement provides students with feedback on their strengths and weaknesses on specific skills by classifying them into mastery or nonmastery attribute categories. These attributes often form a…

Descriptors: Matrices, Classification, Accuracy, Diagnostic Tests

In Search of the Optimal Number of Response Categories in a Rating Scale

Peer reviewed

Direct link

Lee, Jihyun; Paek, Insu – Journal of Psychoeducational Assessment, 2014

Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling…

Descriptors: Likert Scales, Responses, Item Response Theory, Classification

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Improving Measurement Precision of Hierarchical Latent Traits Using Adaptive Testing

Peer reviewed

Direct link

Wang, Chun – Journal of Educational and Behavioral Statistics, 2014

Many latent traits in social sciences display a hierarchical structure, such as intelligence, cognitive ability, or personality. Usually a second-order factor is linearly related to a group of first-order factors (also called domain abilities in cognitive ability measures), and the first-order factors directly govern the actual item responses.…

Descriptors: Measurement, Accuracy, Item Response Theory, Adaptive Testing

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

An Evaluation of Item Response Theory Classification Accuracy and Consistency Indices

Peer reviewed

Direct link

Wyse, Adam E.; Hao, Shiqi – Applied Psychological Measurement, 2012

This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…

Descriptors: Item Response Theory, Classification, Accuracy, Reliability

Psychometric Properties of IRT Proficiency Estimates

Peer reviewed

Direct link

Kolen, Michael J.; Tong, Ye – Educational Measurement: Issues and Practice, 2010

Psychometric properties of item response theory proficiency estimates are considered in this paper. Proficiency estimators based on summed scores and pattern scores include non-Bayes maximum likelihood and test characteristic curve estimators and Bayesian estimators. The psychometric properties investigated include reliability, conditional…

Descriptors: Test Length, Psychometrics, Item Response Theory, Scores

The Utility and Psychometric Properties of the Abel-Blasingame Assessment System for "Individuals with Intellectual Disabilities"

Peer reviewed

Direct link

Blasingame, Gerry D.; Abel, Gene G.; Jordan, Alan; Wiegel, Markus – Journal of Mental Health Research in Intellectual Disabilities, 2011

This article describes the development and utility of the Abel-Blasingame Assessment System for "individuals with intellectual disabilities" (ABID) for assessment of sexual interest and problematic sexual behaviors. The study examined the preliminary psychometric properties and evaluated the clinical utility of the ABID based on a sample…

Descriptors: Mental Retardation, Developmental Delays, Measures (Individuals), Questionnaires

Modeling Nonignorable Missing Data in Speeded Tests

Peer reviewed

Direct link

Glas, Cees A. W.; Pimentel, Jonald L. – Educational and Psychological Measurement, 2008

In tests with time limits, items at the end are often not reached. Usually, the pattern of missing responses depends on the ability level of the respondents; therefore, missing data are not ignorable in statistical inference. This study models data using a combination of two item response theory (IRT) models: one for the observed response data and…

Descriptors: Intelligence Tests, Statistical Inference, Item Response Theory, Modeling (Psychology)

The Effect of Using Item Parameters Calibrated from Paper Administrations in Computer Adaptive Test Administrations

Peer reviewed
PDF on ERIC

Download full text

Direct link

Pommerich, Mary – Journal of Technology, Learning, and Assessment, 2007

Computer administered tests are becoming increasingly prevalent as computer technology becomes more readily available on a large scale. For testing programs that utilize both computer and paper administrations, mode effects are problematic in that they can result in examinee scores that are artificially inflated or deflated. As such, researchers…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Scores

Estimating the Internal Consistency Reliability of Tests Composed of Testlets Varying in Length.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 2002

Considers the degree of bias in testlet-based alpha (internal consistency reliability) through hypothetical examples and real test data from four tests of the Iowa Tests of Basic Skills. Presents a simple formula for computing a testlet-based congeneric coefficient. (SLD)

Descriptors: Estimation (Mathematics), Reliability, Statistical Bias, Test Format

A Brief inventory of Values.

Peer reviewed

Stern, Paul C.; Guagnano, Gregory A.; Dietz, Thomas – Educational and Psychological Measurement, 1998

A brief version of the instrument developed by S. Schwartz (1992, 1994) to measure the structure and content of human values was developed. Studies with 199 adults and 420 adults support the reliability of scores produced by the brief inventory's four three-item scales. Uses of the brief form are discussed. (SLD)

Descriptors: Adults, Reliability, Scores, Test Construction

Previous Page | Next Page »

Pages: 1 | 2

Bradshaw, Laine	2
Abel, Gene G.	1
Bao, Yu	1
Blasingame, Gerry D.	1
Candell, Gregory L.	1
Dietz, Thomas	1
Edwards, Ashley A.	1
Emons, Wilco H. M.	1
Feldt, Leonard S.	1
Glas, Cees A. W.	1
Guagnano, Gregory A.	1
Haladyna, Tom	1
Hao, Shiqi	1
Huggins-Manley, Anne Corinne	1
Johnston, Shirley H.	1
Jordan, Alan	1
Joyner, Keanan J.	1
Kannan, Priya	1
Katz, Irvin R.	1
Kolen, Michael J.	1
Kruyen, Peter M.	1
Lee, Jihyun	1
Liu, Ren	1
Maxwell, Scott E.	1
More ▼