Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 19 |
Since 2006 (last 20 years) | 64 |
Descriptor
Computation | 78 |
Item Response Theory | 37 |
Statistical Analysis | 24 |
Error of Measurement | 21 |
Comparative Analysis | 20 |
Test Items | 20 |
Scores | 18 |
Sampling | 17 |
Accuracy | 16 |
Models | 15 |
National Competency Tests | 15 |
More ▼ |
Source
ETS Research Report Series | 78 |
Author
Haberman, Shelby J. | 16 |
Moses, Tim | 7 |
Oranje, Andreas | 7 |
Qian, Jiahe | 7 |
Zhang, Jinming | 6 |
Kim, Sooyeon | 5 |
Antal, Tamás | 4 |
Dorans, Neil J. | 4 |
Guo, Hongwen | 4 |
Braun, Henry | 3 |
von Davier, Matthias | 3 |
More ▼ |
Publication Type
Journal Articles | 78 |
Reports - Research | 74 |
Reports - Descriptive | 4 |
Numerical/Quantitative Data | 2 |
Speeches/Meeting Papers | 2 |
Tests/Questionnaires | 2 |
Information Analyses | 1 |
Education Level
Audience
Location
California | 1 |
Nevada | 1 |
New Jersey | 1 |
New York | 1 |
United States | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Yanxuan Qu; Sandip Sinharay – ETS Research Report Series, 2023
Though a substantial amount of research exists on imputing missing scores in educational assessments, there is little research on cases where responses or scores to an item are missing for all test takers. In this paper, we tackled the problem of imputing missing scores for tests for which the responses to an item are missing for all test takers.…
Descriptors: Scores, Test Items, Accuracy, Psychometrics
Hongwen Guo; Matthew S. Johnson; Daniel F. McCaffrey; Lixong Gu – ETS Research Report Series, 2024
The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test-taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies…
Descriptors: Test Items, Test Construction, Sample Size, Scaling
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Haberman, Shelby J. – ETS Research Report Series, 2019
Cross-validation is a common statistical procedure applied to problems that are otherwise computationally intractable. It is often employed to assess the effectiveness of prediction procedures. In this report, cross-validation is discussed in terms of "U"-statistics. This approach permits consideration of the statistical properties of…
Descriptors: Statistical Analysis, Generalization, Prediction, Computation
Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2019
We derive formulas for the differential item functioning (DIF) measures that two routinely used DIF statistics are designed to estimate. The DIF measures that match on observed scores are compared to DIF measures based on an unobserved ability (theta or true score) for items that are described by either the one-parameter logistic (1PL) or…
Descriptors: Scores, Test Bias, Statistical Analysis, Item Response Theory
Fu, Jianbin – ETS Research Report Series, 2019
A maximum marginal likelihood estimation with an expectation-maximization algorithm has been developed for estimating multigroup or mixture multidimensional item response theory models using the generalized partial credit function, graded response function, and 3-parameter logistic function. The procedure includes the estimation of item…
Descriptors: Maximum Likelihood Statistics, Mathematics, Item Response Theory, Expectation
Qian, Jiahe – ETS Research Report Series, 2020
The finite population correction (FPC) factor is often used to adjust variance estimators for survey data sampled from a finite population without replacement. As a replicated resampling approach, the jackknife approach is usually implemented without the FPC factor incorporated in its variance estimates. A paradigm is proposed to compare the…
Descriptors: Computation, Sampling, Data, Statistical Analysis
Jewsbury, Paul A. – ETS Research Report Series, 2019
When an assessment undergoes changes to the administration or instrument, bridge studies are typically used to try to ensure comparability of scores before and after the change. Among the most common and powerful is the common population linking design, with the use of a linear transformation to link scores to the metric of the original…
Descriptors: Evaluation Research, Scores, Error Patterns, Error of Measurement
Kim, Sooyeon; Moses, Tim – ETS Research Report Series, 2018
The purpose of this study is to assess the impact of aberrant responses on the estimation accuracy in forced-choice format assessments. To that end, a wide range of aberrant response behaviors (e.g., fake, random, or mechanical responses) affecting upward of 20%--30% of the responses was manipulated under the multi-unidimensional pairwise…
Descriptors: Measurement Techniques, Response Style (Tests), Accuracy, Computation
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2018
Educational assessment data are often collected from a set of test centers across various geographic regions, and therefore the data samples contain clusters. Such cluster-based data may result in clustering effects in variance estimation. However, in many grouped jackknife variance estimation applications, jackknife groups are often formed by a…
Descriptors: Item Response Theory, Scaling, Equated Scores, Cluster Grouping
Kane, Michael T. – ETS Research Report Series, 2017
By aggregating residual gain scores (the differences between each student's current score and a predicted score based on prior performance) for a school or a teacher, value-added models (VAMs) can be used to generate estimates of school or teacher effects. It is known that random errors in the prior scores will introduce bias into predictions of…
Descriptors: Error of Measurement, Value Added Models, Scores, Teacher Effectiveness
Qian, Jiahe – ETS Research Report Series, 2017
The variance formula derived for a two-stage sampling design without replacement employs the joint inclusion probabilities in the first-stage selection of clusters. One of the difficulties encountered in data analysis is the lack of information about such joint inclusion probabilities. One way to solve this issue is by applying Hájek's…
Descriptors: Mathematical Formulas, Computation, Sampling, Research Design
Wei, Youhua; Morgan, Rick – ETS Research Report Series, 2016
As an alternative to common-item equating when common items do not function as expected, the single-group growth model (SGGM) scaling uses common examinees or repeaters to link test scores on different forms. The SGGM scaling assumes that, for repeaters taking adjacent administrations, the conditional distribution of scale scores in later…
Descriptors: Equated Scores, Growth Models, Scaling, Computation
von Davier, Matthias – ETS Research Report Series, 2016
This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
Descriptors: Psychometrics, Mathematics, Models, Statistical Analysis
van Rijn, Peter W.; Ali, Usama S. – ETS Research Report Series, 2018
A computer program was developed to estimate speed-accuracy response models for dichotomous items. This report describes how the models are estimated and how to specify data and input files. An example using data from a listening section of an international language test is described to illustrate the modeling approach and features of the computer…
Descriptors: Computer Software, Computation, Reaction Time, Timed Tests