NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 496 to 510 of 636 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…
Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items
Wang, Xiang Bo – College Board, 2007
This research examines the effect of increased testing time by comparing the four performance indices of randomly equivalent examinee subpopulations on sections of similar content and difficulty administered at different times on three SAT administrations. A variety of analyses were used in this study and found no evidence that the current SAT…
Descriptors: College Entrance Examinations, Thinking Skills, High School Students, Test Length
De Champlain, Andre; Gessaroli, Marc E. – 1996
The use of indices and statistics based on nonlinear factor analysis (NLFA) has become increasingly popular as a means of assessing the dimensionality of an item response matrix. Although the indices and statistics currently available to the practitioner have been shown to be useful and accurate in many testing situations, few studies have…
Descriptors: Adaptive Testing, Chi Square, Computer Assisted Testing, Factor Analysis
Ankenmann, Robert D.; Stone, Clement A. – 1992
Effects of test length, sample size, and assumed ability distribution were investigated in a multiple replication Monte Carlo study under the 1-parameter (1P) and 2-parameter (2P) logistic graded model with five score levels. Accuracy and variability of item parameter and ability estimates were examined. Monte Carlo methods were used to evaluate…
Descriptors: Computer Simulation, Estimation (Mathematics), Item Bias, Mathematical Models
Schumacker, Randall E.; And Others – 1994
Rasch between and total weighted and unweighted fit statistics were compared using varying test lengths and sample sizes. Two test lengths (20 and 50 items) and three sample sizes (150, 500, and 1,000 were crossed. Each of the six combinations were replicated 100 times. In addition, power comparisons were made. Results indicated that there were no…
Descriptors: Comparative Analysis, Goodness of Fit, Item Response Theory, Power (Statistics)
Kunce, Charles S.; Arbet, Scott E. – 1994
The National Conference of Bar Examiners commissioned American College Testing, Inc., to help them in the development and evaluation of a performance test for use in bar admissions decisions. Because it was recognized that candidate perceptions would provide valuable information, a candidate-perception questionnaire was developed to be…
Descriptors: Attitudes, Demography, Languages, Lawyers
Haladyna, Tom; Roid, Gale – 1981
Two approaches to criterion-referenced test construction are compared. Classical test theory is based on the practice of random sampling from a well-defined domain of test items; latent trait theory suggests that the difficulty of the items should be matched to the achievement level of the student. In addition to these two methods of test…
Descriptors: Criterion Referenced Tests, Error of Measurement, Latent Trait Theory, Test Construction
Myers, Charles T. – 1978
The viewpoint is expressed that adding to test reliability by either selecting a more homogeneous set of items, restricting the range of item difficulty as closely as possible to the most efficient level, or increasing the number of items will not add to test validity and that there is considerable danger that efforts to increase reliability may…
Descriptors: Achievement Tests, Item Analysis, Multiple Choice Tests, Test Construction
Saunders, Joseph C.; Huynh, Huynh – 1980
In most reliability studies, the precision of a reliability estimate varies inversely with the number of examinees (sample size). Thus, to achieve a given level of accuracy, some minimum sample size is required. An approximation for this minimum size may be made if some reasonable assumptions regarding the mean and standard deviation of the test…
Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests
Harris, Dickie A.; Penell, Roger J. – 1977
This study used a series of simulations to answer questions about the efficacy of adaptive testing raised by empirical studies. The first study showed that for reasonable high entry points, parameters estimated from paper-and-pencil test protocols cross-validated remarkably well to groups actually tested at a computer terminal. This suggested that…
Descriptors: Adaptive Testing, Computer Assisted Testing, Cost Effectiveness, Difficulty Level
Peer reviewed Peer reviewed
Feild, Hubert S.; And Others – Educational and Psychological Measurement, 1978
Computerized answer sheets in mail surveys are examined for their effects on rate of return and response bias. Results of an empirical study of job satisfaction suggested that computerized answer sheets may be used in mail surveys without significantly affecting rate of return or producing response bias. (Author/JKS)
Descriptors: Answer Sheets, City Government, Computers, Cost Effectiveness
Peer reviewed Peer reviewed
Hambleton, Ronald K.; De Gruijter, Dato N. M. – Journal of Educational Measurement, 1983
Addressing the shortcomings of classical item statistics for selecting criterion-referenced test items, this paper describes an optimal item selection procedure utilizing item response theory (IRT) and offers examples in which random selection and optimal item selection methods are compared. Theoretical advantages of optimal selection based upon…
Descriptors: Criterion Referenced Tests, Cutting Scores, Item Banks, Latent Trait Theory
Davey, Tim; Pommerich, Mary; Thompson, Tony D. – 1999
In computerized adaptive testing (CAT), new or experimental items are frequently administered alongside operational tests to gather the pretest data needed to replenish and replace item pools. The two basic strategies used to combine pretest and operational items are embedding and appending. Variable-length CATs are preferred because of the…
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Banks, Measurement Techniques
Peer reviewed Peer reviewed
Wainer, Howard; And Others – Journal of Educational Measurement, 1992
Computer simulations were run to measure the relationship between testlet validity and factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Making a testlet adaptive yields only modest increases in aggregate validity because of the peakedness of the typical proficiency distribution. (Author/SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Chen, Shu-Ying; Ankenman, Robert D. – Journal of Educational Measurement, 2004
The purpose of this study was to compare the effects of four item selection rules--(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)--with respect to the precision of trait estimation and the…
Descriptors: Test Length, Adaptive Testing, Computer Assisted Testing, Test Selection
Pages: 1  |  ...  |  30  |  31  |  32  |  33  |  34  |  35  |  36  |  37  |  38  |  ...  |  43