Publication Date
| In 2026 | 0 |
| Since 2025 | 53 |
| Since 2022 (last 5 years) | 195 |
| Since 2017 (last 10 years) | 495 |
| Since 2007 (last 20 years) | 743 |
Descriptor
| Test Items | 1187 |
| Test Reliability | 1187 |
| Test Validity | 685 |
| Test Construction | 566 |
| Foreign Countries | 349 |
| Difficulty Level | 280 |
| Item Analysis | 253 |
| Psychometrics | 234 |
| Item Response Theory | 219 |
| Factor Analysis | 183 |
| Multiple Choice Tests | 173 |
| More ▼ | |
Source
Author
| Schoen, Robert C. | 12 |
| LaVenia, Mark | 5 |
| Liu, Ou Lydia | 5 |
| Anderson, Daniel | 4 |
| Bauduin, Charity | 4 |
| DiLuzio, Geneva J. | 4 |
| Farina, Kristy | 4 |
| Haladyna, Thomas M. | 4 |
| Huck, Schuyler W. | 4 |
| Petscher, Yaacov | 4 |
| Stansfield, Charles W. | 4 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 39 |
| Researchers | 30 |
| Teachers | 24 |
| Administrators | 13 |
| Support Staff | 3 |
| Counselors | 2 |
| Students | 2 |
| Community | 1 |
| Parents | 1 |
| Policymakers | 1 |
Location
| Turkey | 69 |
| Indonesia | 37 |
| Germany | 20 |
| Canada | 17 |
| Florida | 17 |
| China | 16 |
| Australia | 15 |
| California | 12 |
| Iran | 11 |
| India | 10 |
| New York | 9 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 1 |
| Meets WWC Standards with or without Reservations | 1 |
Bardar, Erin M.; Prather, Edward E.; Brecher, Kenneth; Slater, Timothy F. – Astronomy Education Review, 2007
This article describes the development and validation of the Light and Spectroscopy Concept Inventory (LSCI), a 26-item diagnostic test designed (1) to measure students' conceptual understanding of topics related to light and spectroscopy, and (2) to evaluate the effectiveness of instructional interventions in promoting meaningful learning gains…
Descriptors: Astronomy, Science Instruction, College Science, Test Construction
Montgomery, Janine M.; Duncan, C. Randy; Francis, Garnett C. – Journal of Psychoeducational Assessment, 2007
The "Pervasive Developmental Disorder Screening Test-II (PDDST-II)--Early Childhood Screener for Autistic Spectrum Disorders" is a clinical screening tool for pervasive developmental disorders (PDD) or autism spectrum disorders (ASD) designed for use by nonspecialist clinicians. It was designed to differentiate children as young as 18 months who…
Descriptors: Early Intervention, Autism, Screening Tests, Identification
Spaan, Mary – Language Assessment Quarterly, 2007
This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
Descriptors: Test Items, Test Construction, Responses, Test Content
Cecen, Ayse Rezan – Educational Sciences: Theory and Practice, 2007
The purpose of this study is to investigate validity and reliability of Short Form of The Family Sense of Coherence Scale's which was developed originally 26 items by Antonovsky and Sourani (1988) and 12 items short form by Sagy (1998). The scale measures individuals' perception of Family Sense of Coherence and it can be applied to adolescents and…
Descriptors: Undergraduate Students, Test Reliability, Test Validity, Measures (Individuals)
Whitney, Douglas R.; And Others – 1985
This research brief summarizes the available reliability and validity data available in, but spread throughout, a number of General Educational Development (GED) Testing Service publications. A section on reliability discusses how to determine reliability of a test's scores and two ways of assessing the reliability of a test--internal consistency…
Descriptors: Adult Education, High School Equivalency Programs, Item Analysis, Scores
Cook, Linda L.; And Others – 1982
Data from the Scholastic Aptitude Test-Verbal (SAT-V), SAT Mathematics (SAT-M), and Achievement Tests in Biology, American History, and Social Studies were used for this study. The temporal stability of item parameter estimates obtained for the same set of items calibrated for different examinees at different times was analyzed. It was believed…
Descriptors: Achievement Tests, Aptitude Tests, Equated Scores, Item Analysis
Peer reviewedLord, Frederic M. – Journal of Educational Measurement, 1977
Two approaches for determining the optimal number of choices for a test item, presently in the literature, are compared with two new approaches. (Author)
Descriptors: Forced Choice Technique, Latent Trait Theory, Multiple Choice Tests, Test Items
Peer reviewedTollefson, Nona – Educational and Psychological Measurement, 1987
This study compared the item difficulty, item discrimination, and test reliability of three forms of multiple-choice items: (1) one correct answer; (2) "none of the above" as a foil; and (3) "none of the above" as the correct answer. Twelve items in the three formats were administered in a college statistics examination. (BS)
Descriptors: Difficulty Level, Higher Education, Item Analysis, Multiple Choice Tests
Peer reviewedBeuchert, A. Kent; Mendoza, Jorge L. – Journal of Educational Measurement, 1979
Ten item discrimination indices, across a variety of item analysis situations, were compared, based on the validities of tests constructed by using each of the indices to select 40 items from a 100-item pool. Item score data were generated by a computer program and included a simulation of guessing. (Author/CTM)
Descriptors: Item Analysis, Simulation, Statistical Analysis, Test Construction
Peer reviewedReuterberg, Sven-Eric; Gustafsson, Jan-Eric – Educational and Psychological Measurement, 1992
The use of confirmatory factor analysis by the LISREL program is demonstrated as an assumption-testing method when computing reliability coefficients under different model assumptions. Results indicate that reliability estimates are robust against departure from the assumption of parallelism of test items. (SLD)
Descriptors: Equations (Mathematics), Estimation (Mathematics), Mathematical Models, Robustness (Statistics)
Peer reviewedCizek, Gregory J.; Robinson, K. Lynne; O'Day, Denis M. – Educational and Psychological Measurement, 1998
The effect of removing nonfunctioning items from multiple-choice tests was studied by examining change in difficulty, discrimination, and dimensionality. Results provide additional support for the benefits of eliminating nonfunctioning options, such as enhanced score reliability, reduced testing time, potential for broader domain sampling, and…
Descriptors: Difficulty Level, Multiple Choice Tests, Sampling, Scores
Peer reviewedTaylor, Annette Kujawski – College Student Journal, 2005
This research examined 2 elements of multiple-choice test construction, balancing the key and optimal number of options. In Experiment 1 the 3 conditions included a balanced key, overrepresentation of a and b responses, and overrepresentation of c and d responses. The results showed that error-patterns were independent of the key, reflecting…
Descriptors: Comparative Analysis, Test Items, Multiple Choice Tests, Test Construction
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2006
Many academic tests (e.g. short-answer and multiple-choice) sample required knowledge with questions scoring 0 or 1 (dichotomous scoring). Few textbooks give useful guidance on the length of test needed to do this reliably. Posey's binomial error model of 1932 provides the best starting point, but allows neither for heterogeneity of question…
Descriptors: Item Sampling, Tests, Test Length, Test Reliability
Qaqish, Basil – Online Submission, 2007
ACT college test publisher provided scores. On average, non-homeschoolers performed better than homeschoolers, by about two items, out of sixty items, on the ACT mathematics test that was analyzed. This result may be due to the different teaching/learning media used in teaching each of the two groups, to different teacher/student interaction, or…
Descriptors: Home Schooling, College Entrance Examinations, Standardized Tests, Achievement Tests
Zwick, Rebecca; And Others – 1993
Although the belief has been expressed that performance assessments are intrinsically more fair than multiple-choice measures, some forms of performance assessment may in fact be more likely than conventional tests to tap construct-irrelevant factors. As performance assessment grows in popularity, it will be increasingly important to monitor the…
Descriptors: Educational Assessment, Item Bias, Multiple Choice Tests, Performance Based Assessment

Direct link
