Publication Date
| In 2026 | 0 |
| Since 2025 | 197 |
| Since 2022 (last 5 years) | 1067 |
| Since 2017 (last 10 years) | 2577 |
| Since 2007 (last 20 years) | 4938 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 225 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 65 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Peer reviewedSchwarz, Shirley P.; And Others – Journal of Educational Measurement, 1991
Interviews were conducted with 104 students in masters' level classes to determine their reasons for changing test answers. Subjects previously had been instructed in answer-changing strategies. Most changes were for thought out reasons; few were because of clerical errors. Reconsideration of test items is probably underestimated in…
Descriptors: Achievement Gains, Graduate Students, Guessing (Tests), Higher Education
Stoneall, Linda – Training and Development, 1991
Describes questioning methods trainers can use to uncover training needs (interviews, surveys, test questions, program evaluations). Illustrates the use of questions at the beginning, middle, and end of training sessions. (SK)
Descriptors: Adult Education, Discussion (Teaching Technique), Evaluation Methods, Interviews
Peer reviewedBoekkooi-Timminga, Ellen – Applied Psychological Measurement, 1990
A new test construction model based on the Rasch model is proposed. This model, the cluster-based method, considers groups of interchangeable items rather than individual items and uses integer programing. Results for six test construction problems indicate that the method produces accurate results in small amounts of time. (SLD)
Descriptors: Cluster Analysis, Computer Assisted Testing, Equations (Mathematics), Item Banks
Peer reviewedFrary, Robert B. – Applied Measurement in Education, 1991
The use of the "none-of-the-above" option (NOTA) in 20 college-level multiple-choice tests was evaluated for classes with 100 or more students. Eight academic disciplines were represented, and 295 NOTA and 724 regular test items were used. It appears that the NOTA can be compatible with good classroom measurement. (TJH)
Descriptors: College Students, Comparative Testing, Difficulty Level, Discriminant Analysis
Peer reviewedRost, Jurgen – Applied Psychological Measurement, 1990
Combining Rasch and latent class models is presented as a way to overcome deficiencies and retain the positive features of both. An estimation algorithm is outlined, providing conditional maximum likelihood estimates of item parameters for each class. The model is illustrated with simulated data and real data (n=869 adults). (SLD)
Descriptors: Adults, Algorithms, Computer Simulation, Equations (Mathematics)
Peer reviewedHills, John R. – Educational Measurement: Issues and Practice, 1993
A scenario and accompanying questions and answers are posed to help educators examine possible problems in interpreting a student's test score profile. Profiles developed and used soundly are very helpful, but possible pitfalls in test interpretation must be recognized. (SLD)
Descriptors: Academic Achievement, Educational Assessment, Elementary Secondary Education, Performance
Peer reviewedGardner, Donald G.; Cummings, L. L.; Dunham, Randall B.; Pierce, Jon L. – Educational and Psychological Measurement, 1998
Whether traditional Likert-type focus of attention at work scales would outperform the one-item scales developed by D. Gardner and others (1989) was studied with responses of 492 automobile-services-club employees. Confirmatory factor analysis did not show either method to be better empirically. Situations in which the one-item scale might be…
Descriptors: Attention, Comparative Analysis, Employees, Likert Scales
Peer reviewedErcikan, Kadriye; Schwartz, Richard D.; Julian, Marc W.; Burket, George R.; Weber, Melba M.; Link, Valerie – Journal of Educational Measurement, 1998
Discusses and demonstrates combining scores from multiple-choice (MC) and constructed-response (CR) items to create a common scale using Item Response Theory methodology. Provides empirical results using a set of tests in reading, language, mathematics, and science in three grades. (SLD)
Descriptors: Constructed Response, Elementary Secondary Education, Item Response Theory, Language Arts
Peer reviewedEnright, Mary K.; Rock, Donald A.; Bennett, Randy Elliot – Journal of Educational Measurement, 1998
Examined alternative-item types and section configurations for improving the discriminant and convergent validity of the Graduate Record Examination (GRE) general test using a computer-based test given to 388 examinees who had taken the GRE previously. Adding new variations of logical meaning appeared to decrease discriminant validity. (SLD)
Descriptors: Admission (School), College Entrance Examinations, College Students, Computer Assisted Testing
Peer reviewedStricker, Lawrence J.; Emmerich, Walter – Journal of Educational Measurement, 1999
Examined the connection between gender differences in examinees' familiarity, interest, and negative emotional reactions to items on the College Board's Advanced Placement Psychology Examination and the items' differential item functioning (DIF). Gender differences for a sample of 717 students for the 3 variables were substantially related to the…
Descriptors: Advanced Placement, Correlation, Emotional Response, Familiarity
Peer reviewedWester, Anita; Henriksson, Widar – Studies in Educational Evaluation, 2000
Examined whether changes in format of mathematics items in the Third International Mathematics and Science Study (TIMSS) had any effect on gender differences in performance using a Swedish sample of 8,851 sixth, seventh, and eighth graders. Results show no significant changes in gender differences when item format is altered. (SLD)
Descriptors: Interaction, International Studies, Junior High School Students, Junior High Schools
Peer reviewedBarnette, J. Jackson – Educational and Psychological Measurement, 2000
Used a design in which item stem direction and item response pattern were crossed to determine effects on internal consistency reliability. Results from high school and college students and teachers (150 individuals per test form) suggest using directly worded items with half of the response items going in one direction, and half in the other.…
Descriptors: College Students, High School Students, High Schools, Higher Education
Peer reviewedCohen, Steve; And Others – Journal of Educational and Behavioral Statistics, 1996
A detailed multisite evaluation of instructional software, the ConStatS package, designed to help students conceptualize introductory probability and statistics, yielded patterns of error on several assessment items. Results from 739 college students demonstrated 10 misconceptions that may be among the most difficult concepts to teach. (SLD)
Descriptors: College Students, Computer Assisted Instruction, Computer Software Evaluation, Educational Assessment
Peer reviewedLundeberg, Mary A.; And Others – Journal of Educational Psychology, 1994
Gender differences in item-specific confidence judgments were studied for 70 male and 181 female college students. Gender differences in confidence were dependent on context and the domain being tested. Both men and women were overconfident, but men were especially overconfident when incorrect. (SLD)
Descriptors: College Students, Confidence Testing, Context Effect, Difficulty Level
Peer reviewedChang, Lei; And Others – Applied Measurement in Education, 1996
The influence of judges' knowledge on standard setting for competency tests was studied with 17 judges who took an economics teacher certification test while setting competency standards using the Angoff procedure. Judges tended to set higher standards for items they answered correctly and lower standards for items they answered incorrectly. (SLD)
Descriptors: Competence, Difficulty Level, Economics, Judges


