NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 196 to 210 of 226 results Save | Export
Henning, Grant – 1991
Criticisms of the Test of English as a Foreign Language (TOEFL) have included speculation that the listening test places too much burden on short-term memory as compared with comprehension, that a knowledge of reading is required to respond successfully, and that many items appear to require mere recall and matching rather than higher-order…
Descriptors: Adults, Auditory Stimuli, Cognitive Processes, Educational Assessment
Oosterhof, Albert C.; Coats, Pamela K. – 1981
Instructors who develop classroom examinations that require students to provide a numerical response to a mathematical problem are often very concerned about the appropriateness of the multiple-choice format. The present study augments previous research relevant to this concern by comparing the difficulty and reliability of multiple-choice and…
Descriptors: Comparative Analysis, Difficulty Level, Grading, Higher Education
Wainer, Howard; And Others – 1990
The initial development of a testlet-based algebra test was previously reported (Wainer and Lewis, 1990). This account provides the details of this excursion into the use of hierarchical testlets and validity-based scoring. A pretest of two 15-item hierarchical testlets was carried out in which examinees' performance on a 4-item subset of each…
Descriptors: Adaptive Testing, Algebra, Comparative Analysis, Computer Assisted Testing
Lee, J. Murray; Segel, David – Office of Education, United States Department of the Interior, 1936
In order to make an intelligent advance in any school practice a knowledge of what schools are doing in that practice is almost indispensable, since a transition in procedures must be a growth from the one to the other. This bulletin gives this background of facts concerning the use of tests and examinations by the different subject departments in…
Descriptors: Testing, Teachers, Standardized Tests, Principals
Gialluca, Kathleen A.; And Others – 1984
In this study, simulated and actual Air Force test data were used to compare the different procedures for equating mental tests: conventional (equipercentile and linear), Item Response Theory (IRT), and strong true-score theory (STST); data collection designs used were single-group, equivalent-groups, and anchor test. Equating transformations were…
Descriptors: Adults, Cognitive Ability, Cognitive Tests, Comparative Analysis
Weiss, David J.; McBride, James R. – 1983
Monte Carlo simulation was used to investigate score bias and information characteristics of Owen's Bayesian adaptive testing strategy, and to examine possible causes of score bias. Factors investigated in three related studies included effects of item discrimination, effects of fixed vs. variable test length, and effects of an accurate prior…
Descriptors: Ability Identification, Adaptive Testing, Bayesian Statistics, Computer Assisted Testing
Robertson, David W.; And Others – 1977
A comparative study of item analysis was conducted on the basis of race to determine whether alternative test construction or processing might increase the proportion of black enlisted personnel among those passing various military technical knowledge examinations. The study used data from six specialists at four grade levels and investigated item…
Descriptors: Difficulty Level, Enlisted Personnel, Item Analysis, Occupational Tests
Berk, Ronald A. – 1979
Four factors essential to determining how many items should be constructed or sampled for a set of objectives are examined: (1) importance and type of decisions to be made with the results; (2) importance and emphases assigned to the instructional and behavioral objectives; (3) number of objectives; (4) practical constraints, such as item writing…
Descriptors: Behavioral Objectives, Course Objectives, Criterion Referenced Tests, Decision Making
Hambleton, Ronald K.; And Others – 1987
The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…
Descriptors: Comparative Analysis, Content Validity, Cutting Scores, Difficulty Level
Lenel, Julia C.; Gilmer, Jerry S. – 1986
In some testing programs an early item analysis is performed before final scoring in order to validate the intended keys. As a result, some items which are flawed and do not discriminate well may be keyed so as to give credit to examinees no matter which answer was chosen. This is referred to as allkeying. This research examined how varying the…
Descriptors: Equated Scores, Item Analysis, Latent Trait Theory, Licensing Examinations (Professions)
Samejima, Fumiko – 1986
Item analysis data fitting the normal ogive model were simulated in order to investigate the problems encountered when applying the three-parameter logistic model. Binary item tests containing 10 and 35 items were created, and Monte Carlo methods simulated the responses of 2,000 and 500 examinees. Item parameters were obtained using Logist 5.…
Descriptors: Computer Simulation, Difficulty Level, Guessing (Tests), Item Analysis
Maurelli, Vincent A.; Weiss, David J. – 1981
A monte carlo simulation was conducted to assess the effects in an adaptive testing strategy for test batteries of varying subtest order, subtest termination criterion, and variable versus fixed entry on the psychometric properties of an existent achievement test battery. Comparisons were made among conventionally administered tests and adaptive…
Descriptors: Achievement Tests, Adaptive Testing, Computer Assisted Testing, Latent Trait Theory
Peer reviewed Peer reviewed
Budescu, David V.; Nevo, Baruch – Journal of Educational Measurement, 1985
The proportionality model assumes that total testing time is proportional to the number of test items and the number of options per multiple choice test item. This assumption was examined, using test items having from two to five options. The model was not supported. (Author/GDC)
Descriptors: College Entrance Examinations, Foreign Countries, Higher Education, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Embretson, Susan E. – Measurement: Interdisciplinary Research and Perspectives, 2004
The last century was marked by dazzling changes in many areas, such as technology and communications. Predictions into the second century of testing are seemingly difficult in such a context. Yet, looking back to the turn of the last century, Kirkpatrick (1900), in his American Psychological Association presidential address, presented fundamental…
Descriptors: Ability, Testing, Futures (of Society), Psychometrics
Henning, Grant – 1993
This study provides information about the total and component scores of the Test of English as a Foreign Language (TOEFL). First, the study provides comparative global and component estimates of test-retest, alternate-form, and internal-consistency reliability, controlling for sources of measurement error inherent in the examinees and the testing…
Descriptors: Difficulty Level, English (Second Language), Error of Measurement, Estimation (Mathematics)
Pages: 1  |  ...  |  6  |  7  |  8  |  9  |  10  |  11  |  12  |  13  |  14  |  15  |  16