ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	8

Descriptor

Sampling	15
Test Length	15
Sample Size	7
Error of Measurement	6
Test Construction	6
Test Items	6
Statistical Analysis	4
Achievement Tests	3
Computation	3
Cutting Scores	3
Data Analysis	3
Item Response Theory	3
Criterion Referenced Tests	2
Difficulty Level	2
Item Analysis	2
Item Banks	2
Mastery Tests	2
Mathematical Formulas	2
Mathematical Models	2
Mathematics	2
Models	2
Nonparametric Statistics	2
Probability	2
Reliability	2
Research Design	2
More ▼

Source

Educational and Psychological…	3
ETS Research Report Series	2
Applied Measurement in…	1
International Journal of…	1
Journal of Educational and…	1
Journal of Experimental…	1

Publication Type

Reports - Research	15
Journal Articles	9
Speeches/Meeting Papers	2

Education Level

Secondary Education

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
National Longitudinal Study…	1
Program for International…	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Investigation of a Multistage Adaptive Test Based on Test Assembly Methods

Peer reviewed
PDF on ERIC

Download full text

Ebru Dogruöz; Hülya Kelecioglu – International Journal of Assessment Tools in Education, 2024

In this research, multistage adaptive tests (MST) were compared according to sample size, panel pattern and module length for top-down and bottom-up test assembly methods. Within the scope of the research, data from PISA 2015 were used and simulation studies were conducted according to the parameters estimated from these data. Analysis results for…

Descriptors: Adaptive Testing, Test Construction, Foreign Countries, Achievement Tests

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Detection and Treatment of Careless Responses to Improve Item Parameter Estimation

Peer reviewed

Direct link

Patton, Jeffrey M.; Cheng, Ying; Hong, Maxwell; Diao, Qi – Journal of Educational and Behavioral Statistics, 2019

In psychological and survey research, the prevalence and serious consequences of careless responses from unmotivated participants are well known. In this study, we propose to iteratively detect careless responders and cleanse the data by removing their responses. The careless responders are detected using person-fit statistics. In two simulation…

Descriptors: Test Items, Response Style (Tests), Identification, Computation

Evaluating the Impact of Guessing and Its Interactions with Other Test Characteristics on Confidence Interval Procedures for Coefficient Alpha

Peer reviewed

Direct link

Paek, Insu – Educational and Psychological Measurement, 2016

The effect of guessing on the point estimate of coefficient alpha has been studied in the literature, but the impact of guessing and its interactions with other test characteristics on the interval estimators for coefficient alpha has not been fully investigated. This study examined the impact of guessing and its interactions with other test…

Descriptors: Guessing (Tests), Computation, Statistical Analysis, Test Length

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Minimum Sample Size Requirements for Mokken Scale Analysis

Peer reviewed

Direct link

Straat, J. Hendrik; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2014

An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken's original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum…

Descriptors: Sampling, Test Items, Effect Size, Scaling

Assessing Goodness of Fit in Item Response Theory with Nonparametric Models: A Comparison of Posterior Probabilities and Kernel-Smoothing Approaches

Peer reviewed

Direct link

Sueiro, Manuel J.; Abad, Francisco J. – Educational and Psychological Measurement, 2011

The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…

Descriptors: Goodness of Fit, Item Response Theory, Nonparametric Statistics, Probability

Evaluation of Methods to Compute Complex Sample Standard Errors in Latent Regression Models. Research Report. ETS RR-09-49

Peer reviewed
PDF on ERIC

Download full text

Oranje, Andreas; Li, Deping; Kandathil, Mathew – ETS Research Report Series, 2009

Several complex sample standard error estimators based on linearization and resampling for the latent regression model of the National Assessment of Educational Progress (NAEP) are studied with respect to design choices such as number of items, number of regressors, and the efficiency of the sample. This paper provides an evaluation of the extent…

Descriptors: Error of Measurement, Computation, Regression (Statistics), National Competency Tests

Estimation of Test Length for Domain-Referenced Reading Comprehension Tests.

Peer reviewed

Berk, Ronald A. – Journal of Experimental Education, 1980

A sampling methodology is proposed for determining lengths of tests designed to assess the comprehension of written discourse. It is based on Bormuth's transformational analysis, within a domain-referenced framework. Guidelines are provided for computing sample size and selecting sentences to which the transformational rules can be applied.…

Descriptors: Reading Comprehension, Reading Tests, Sampling, Test Construction

Dependent Variable Reliability and Determination of Sample Size.

Maxwell, Scott E. – 1979

Arguments have recently been put forth that standard textbook procedures for determining the sample size necessary to achieve a certain level of power in a completely randomized design are incorrect when the dependent variable is fallible because they ignore measurement error. In fact, however, there are several correct procedures, one of which is…

Descriptors: Hypothesis Testing, Mathematical Formulas, Power (Statistics), Predictor Variables

Item Pool Construction for Use With Latent Trait Models.

PDF pending restoration

Reckase, Mark D. – 1979

Because latent trait models require that large numbers of items be calibrated or that testing of the same large group be repeated, item parameter estimates are often obtained by administering separate tests to different groups and "linking" the results to construct an adequate item pool. Four issues were studied, based upon the analysis…

Descriptors: Achievement Tests, High Schools, Item Banks, Mathematical Models

A Comparison of Simple Random Sampling Versus Stratification for Allocating Items to Subtests in Multiple Matrix Sampling.

Download full text

Scheetz, James P.; Forsyth, Robert A. – 1977

Empirical evidence is presented related to the effects of using a stratified sampling of items in multiple matrix sampling on the accuracy of estimates of the population mean. Data were obtained from a sample of 600 high school students for a 36-item mathematics test and a 40-item vocabulary test, both subtests of the Iowa Tests of Educational…

Descriptors: Achievement Tests, Difficulty Level, Item Analysis, Item Sampling

An Investigation of Methods for Reducing Sampling Error in Certain IRT Procedures.

Download full text

Wingersky, Marilyn S.; Lord, Frederic M. – 1983

The sampling errors of maximum likelihood estimates of item-response theory parameters are studied in the case where both people and item parameters are estimated simultaneously. A check on the validity of the standard error formulas is carried out. The effect of varying sample size, test length, and the shape of the ability distribution is…

Descriptors: Error of Measurement, Estimation (Mathematics), Item Banks, Latent Trait Theory

Evaluation of Criterion-Referenced Reliability Coefficients. Final Report.

Download full text

Subkoviak, Michael J. – 1977

Four different procedures were used for estimating the proportion of persons who would be classified consistently as either passing both of two parallel tests or failing both. These four methods were applied at each of four different mastery level scores for each of three different length tests. Data were based on 50 replications of each procedure…

Descriptors: Criterion Referenced Tests, Cutting Scores, Data Analysis, Data Collection

Prescribing Test Length for Criterion-Referenced Measurement. I. Posttests. ACT Technical Bulletin No. 18.

Download full text

Novick, Melvin R.; Lewis, Charles – 1974

In a program of Individually Prescribed Instruction (IPI), where a student's progress through each level of a program of study is governed by his performance on a test dealing with individual behavioral objectives, there is considerable value in keeping the number of items on each test at a minimum. The specified test length for each objective…

Descriptors: Behavioral Objectives, Criterion Referenced Tests, Cutting Scores, Elementary Secondary Education

Abad, Francisco J.	1
Berk, Ronald A.	1
Cheng, Ying	1
Diao, Qi	1
Dorans, Neil J.	1
Ebru Dogruöz	1
Forsyth, Robert A.	1
Guo, Hongwen	1
Hong, Maxwell	1
Hülya Kelecioglu	1
Kandathil, Mathew	1
Kannan, Priya	1
Katz, Irvin R.	1
Lewis, Charles	1
Li, Deping	1
Lord, Frederic M.	1
Lu, Ru	1
Maxwell, Scott E.	1
Novick, Melvin R.	1
Oranje, Andreas	1
Paek, Insu	1
Patton, Jeffrey M.	1
Reckase, Mark D.	1
Scheetz, James P.	1
Sgammato, Adrienne	1
More ▼