NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 14 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Castellano, Katherine E.; McCaffrey, Daniel F.; Lockwood, J. R. – Journal of Educational Measurement, 2023
The simple average of student growth scores is often used in accountability systems, but it can be problematic for decision making. When computed using a small/moderate number of students, it can be sensitive to the sample, resulting in inaccurate representations of growth of the students, low year-to-year stability, and inequities for…
Descriptors: Academic Achievement, Accountability, Decision Making, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip – Journal of Educational Measurement, 2016
De la Torre and Deng suggested a resampling-based approach for person-fit assessment (PFA). The approach involves the use of the [math equation unavailable] statistic, a corrected expected a posteriori estimate of the examinee ability, and the Monte Carlo (MC) resampling method. The Type I error rate of the approach was closer to the nominal level…
Descriptors: Sampling, Research Methodology, Error Patterns, Monte Carlo Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Zu, Jiyun; Puhan, Gautam – Journal of Educational Measurement, 2014
Preequating is in demand because it reduces score reporting time. In this article, we evaluated an observed-score preequating method: the empirical item characteristic curve (EICC) method, which makes preequating without item response theory (IRT) possible. EICC preequating results were compared with a criterion equating and with IRT true-score…
Descriptors: Item Response Theory, Equated Scores, Item Analysis, Item Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Kane, Michael T. – Journal of Educational Measurement, 2013
To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based…
Descriptors: Test Interpretation, Validity, Scores, Test Use
Peer reviewed Peer reviewed
Shavelson, Richard J.; Ruiz-Primo, Maria Araceli; Wiley, Edward W. – Journal of Educational Measurement, 1999
Reports a reanalysis of data collected in a person x task x occasion rater or method G-study design (M. Ruiz-Primo and others, 1993), and brings this reanalysis to bear on the interpretation of task-sampling variability and the convergence of different performance-assessment methods. (SLD)
Descriptors: Performance Based Assessment, Sampling, Sciences
Peer reviewed Peer reviewed
Scrams, David J.; McLeod, Lori D. – Journal of Educational Measurement, 2000
Presents an approach to graphical differential item functioning (DIF) based on a sampling-theory approach to expected response functions. Applied the approach to a set of pretest items and compared results to traditional Mantel Haenszel DIF statistics. Discusses implications of the method as a complement to the approach of P. Pashley (1992). (SLD)
Descriptors: Item Bias, Pretests Posttests, Sampling
Peer reviewed Peer reviewed
Allen, Nancy L.; Donoghue, John R. – Journal of Educational Measurement, 1996
Examined the effect of complex sampling of items on the measurement of differential item functioning (DIF) using the Mantel-Haenszel procedure through a Monte Carlo study. Suggests the superiority of the pooled booklet method when items are selected for examinees according to a balanced incomplete block design. Discusses implications for other DIF…
Descriptors: Item Bias, Monte Carlo Methods, Research Design, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Huitzing, Hiddo A.; Veldkamp, Bernard P.; Verschoor, Angela J. – Journal of Educational Measurement, 2005
Several techniques exist to automatically put together a test meeting a number of specifications. In an item bank, the items are stored with their characteristics. A test is constructed by selecting a set of items that fulfills the specifications set by the test assembler. Test assembly problems are often formulated in terms of a model consisting…
Descriptors: Testing Programs, Programming, Mathematics, Item Sampling
Peer reviewed Peer reviewed
Gohmann, Stephen F. – Journal of Educational Measurement, 1988
One method to correct for selection bias in comparing Scholastic Aptitude Test (SAT) scores among states is presented, which is a modification of J. J. Heckman's Selection Bias Correction (1976, 1979). Empirical results suggest that sample selection bias is present in SAT score regressions. (SLD)
Descriptors: Regression (Statistics), Sampling, Scoring, Selection
Peer reviewed Peer reviewed
Linn, Robert L.; Dunbar, Stephen B. – Journal of Educational Measurement, 1992
Several issues related to the design and reporting of results from the National Assessment of Educational Progress (NAEP) are discussed in the context of current expectations for the NAEP and its origins. These issues include: (1) content coverage and format; (2) estimation procedures; and (3) reporting problems. (SLD)
Descriptors: Content Analysis, Educational Assessment, Elementary Secondary Education, Estimation (Mathematics)
Peer reviewed Peer reviewed
Willms, J. Douglas; Raudenbush, Stephen W. – Journal of Educational Measurement, 1989
A general longitudinal model is presented for estimating school effects and their stability. The model, capable of separating true changes from sampling and measurement error, controls statistically for effects of factors exogenous to the school system. The model is illustrated with data from large cohorts of students in Scotland. (SLD)
Descriptors: Elementary Secondary Education, Equations (Mathematics), Error of Measurement, Estimation (Mathematics)
Peer reviewed Peer reviewed
Shavelson, Richard J.; And Others – Journal of Educational Measurement, 1993
Evidence is presented on the generalizability and convergent validity of performance assessments using data from six studies of student achievement that sampled a wide range of measurement facets and methods. Results at individual and school levels indicate that task-sampling variability is the major source of measurement error. (SLD)
Descriptors: Academic Achievement, Educational Assessment, Error of Measurement, Generalizability Theory
Peer reviewed Peer reviewed
Mislevy, Robert J.; And Others – Journal of Educational Measurement, 1992
Concepts behind plausible values in estimating population characteristics from sparse matrix samples of item responses are discussed. The use of marginal analyses is described in the context of the National Assessment of Educational Progress, and the approach is illustrated with Scholastic Aptitude Test data for 9,075 high school seniors. (SLD)
Descriptors: College Entrance Examinations, Educational Assessment, Equations (Mathematics), Estimation (Mathematics)
Peer reviewed Peer reviewed
Johnson, Eugene G. – Journal of Educational Measurement, 1992
Features of the design of the National Assessment of Educational Progress (NAEP) are discussed, with emphasis on the design of the 1992 assessment. Student sample designs for the NAEP and the Trial State Assessment are described, and the focused-balanced incomplete block spiraling method of item sampling is discussed. (SLD)
Descriptors: Academic Achievement, Educational Assessment, Educational Change, Elementary Secondary Education