Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 5 |
| Since 2017 (last 10 years) | 7 |
| Since 2007 (last 20 years) | 11 |
Descriptor
| Test Items | 32 |
| Test Length | 32 |
| Test Validity | 32 |
| Test Reliability | 17 |
| Test Construction | 15 |
| Test Format | 10 |
| Adaptive Testing | 9 |
| Computer Assisted Testing | 9 |
| Testing Problems | 7 |
| Item Banks | 6 |
| Difficulty Level | 5 |
| More ▼ | |
Source
Author
| Wainer, Howard | 3 |
| Andy Rick Sánchez-Villena | 1 |
| Basman, Munevver | 1 |
| Boyd, Thomas A. | 1 |
| Bruce, K. | 1 |
| Bulut, Okan | 1 |
| Byars, Alvin Gregg | 1 |
| Camilli, Gregory | 1 |
| Cliff, Norman | 1 |
| Coats, Pamela K. | 1 |
| Cristopher Lino-Cruz | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 3 |
| Postsecondary Education | 3 |
| Middle Schools | 2 |
| Elementary Education | 1 |
| Elementary Secondary Education | 1 |
| Grade 6 | 1 |
| High Schools | 1 |
| Intermediate Grades | 1 |
| Junior High Schools | 1 |
| Secondary Education | 1 |
Audience
| Researchers | 2 |
| Community | 1 |
| Practitioners | 1 |
Location
| Turkey | 2 |
| Japan | 1 |
| New Jersey | 1 |
| Peru | 1 |
Laws, Policies, & Programs
| Job Training Partnership Act… | 1 |
Assessments and Surveys
| Force Concept Inventory | 1 |
| Stanford Binet Intelligence… | 1 |
| Test of English as a Foreign… | 1 |
| Wechsler Intelligence Scale… | 1 |
| Wechsler Intelligence Scales… | 1 |
What Works Clearinghouse Rating
Jun-ichiro Yasuda; Michael M. Hull; Naohiro Mae; Kentaro Kojima – Physical Review Physics Education Research, 2025
Although conceptual assessment tests are commonly administered at the beginning and end of a semester, this pre-post approach has inherent limitations. Specifically, education researchers and instructors have limited ability to observe the progression of students' conceptual understanding throughout the course. Furthermore, instructors are limited…
Descriptors: Computer Assisted Testing, Adaptive Testing, Science Tests, Scientific Concepts
José Ventura-León; Cristopher Lino-Cruz; Shirley Tocto-Muñoz; Andy Rick Sánchez-Villena – Journal of Psychoeducational Assessment, 2025
Academic and occupational success requires social intelligence, the ability to comprehend, and manage interpersonal connections. This research aims to assess and improve the Tromsø Social Intelligence Scale (TSIS) for Peruvian university students, focusing on cultural adaptability, reliability, and validity. Participants included 973 university…
Descriptors: Factor Analysis, Intelligence Tests, Test Items, Test Length
Basman, Munevver – International Journal of Assessment Tools in Education, 2023
To ensure the validity of the tests is to check that all items have similar results across different groups of individuals. However, differential item functioning (DIF) occurs when the results of individuals with equal ability levels from different groups differ from each other on the same test item. Based on Item Response Theory and Classic Test…
Descriptors: Test Bias, Test Items, Test Validity, Item Response Theory
Pasquale Anselmi; Jürgen Heller; Luca Stefanutti; Egidio Robusto; Giulia Barillari – Education and Information Technologies, 2025
Competence-based test development (CbTD) is a novel method for constructing tests that are as informative as possible about the competence state (the set of skills an individual masters) underlying the item responses. If desired, the tests can also be minimal, meaning that no item can be eliminated without reducing their informativeness. To…
Descriptors: Competency Based Education, Test Construction, Test Length, Usability
Yasuda, Jun-ichiro; Hull, Michael M.; Mae, Naohiro – Physical Review Physics Education Research, 2022
This paper presents improvements made to a computerized adaptive testing (CAT)-based version of the FCI (FCI-CAT) in regards to test security and test efficiency. First, we will discuss measures to enhance test security by controlling for item overexposure, decreasing the risk that respondents may (i) memorize the content of a pretest for use on…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Items, Risk Management
Kilic, Abdullah Faruk; Dogan, Nuri – International Journal of Assessment Tools in Education, 2021
Weighted least squares (WLS), weighted least squares mean-and-variance-adjusted (WLSMV), unweighted least squares mean-and-variance-adjusted (ULSMV), maximum likelihood (ML), robust maximum likelihood (MLR) and Bayesian estimation methods were compared in mixed item response type data via Monte Carlo simulation. The percentage of polytomous items,…
Descriptors: Factor Analysis, Computation, Least Squares Statistics, Maximum Likelihood Statistics
Steinkamp, Susan Christa – ProQuest LLC, 2017
For test scores that rely on the accurate estimation of ability via an IRT model, their use and interpretation is dependent upon the assumption that the IRT model fits the data. Examinees who do not put forth full effort in answering test questions, have prior knowledge of test content, or do not approach a test with the intent of answering…
Descriptors: Test Items, Item Response Theory, Scores, Test Wiseness
Sabatini, J.; O'Reilly, T.; Halderman, L.; Bruce, K. – Grantee Submission, 2014
Existing reading comprehension assessments have been criticized by researchers, educators, and policy makers, especially regarding their coverage, utility, and authenticity. The purpose of the current study was to evaluate a new assessment of reading comprehension that was designed to broaden the construct of reading. In light of these issues, we…
Descriptors: Reading Comprehension, Vignettes, Reading Tests, Elementary School Students
Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…
Descriptors: Test Length, Test Items, Alignment (Education), Models
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
Bulut, Okan; Kan, Adnan – Eurasian Journal of Educational Research, 2012
Problem Statement: Computerized adaptive testing (CAT) is a sophisticated and efficient way of delivering examinations. In CAT, items for each examinee are selected from an item bank based on the examinee's responses to the items. In this way, the difficulty level of the test is adjusted based on the examinee's ability level. Instead of…
Descriptors: Adaptive Testing, Computer Assisted Testing, College Entrance Examinations, Graduate Students
Peer reviewedOwen, Steven V.; Froman, Robin D. – Educational and Psychological Measurement, 1987
To test further for efficacy of three-option achievement items, parallel three- and five-option item tests were distributed randomly to college students. Results showed no differences in mean item difficulty, mean discrimination or total test score, but a substantial reduction in time spent on three-option items. (Author/BS)
Descriptors: Achievement Tests, Higher Education, Multiple Choice Tests, Test Format
Peer reviewedGreen, Kathy – Journal of Experimental Education, 1979
Reliabilities and concurrent validities of teacher-made multiple-choice and true-false tests were compared. No significant differences were found even when multiple-choice reliability was adjusted to equate testing time. (Author/MH)
Descriptors: Comparative Testing, Higher Education, Multiple Choice Tests, Test Format
Wainer, Howard; And Others – 1991
A series of computer simulations was run to measure the relationship between testlet validity and the factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Results confirmed the generality of earlier empirical findings of H. Wainer and others (1991) that making a testlet adaptive yields only marginal…
Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Simulation, Item Banks
Graham, Darol L. – 1974
The adequacy of a test developed for statewide assessment of basic mathematics skills was investigated. The test, comprised of multiple-choice items reflecting a series of behavioral objectives, was compared with a more extensive criterion measure generated from the same objectives by the application of a strict item sampling model. In many…
Descriptors: Comparative Testing, Criterion Referenced Tests, Educational Assessment, Item Sampling

Direct link
