Publication Date
| In 2026 | 0 |
| Since 2025 | 5 |
| Since 2022 (last 5 years) | 10 |
| Since 2017 (last 10 years) | 33 |
| Since 2007 (last 20 years) | 51 |
Descriptor
| Test Length | 133 |
| Test Reliability | 133 |
| Test Validity | 63 |
| Test Items | 44 |
| Test Construction | 42 |
| Scores | 24 |
| Test Format | 23 |
| Computer Assisted Testing | 21 |
| Error of Measurement | 20 |
| Foreign Countries | 20 |
| Item Response Theory | 19 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 12 |
| Postsecondary Education | 11 |
| Elementary Education | 9 |
| Secondary Education | 6 |
| Early Childhood Education | 4 |
| Grade 6 | 4 |
| Intermediate Grades | 4 |
| Middle Schools | 4 |
| Primary Education | 4 |
| Grade 3 | 3 |
| Grade 5 | 3 |
| More ▼ | |
Audience
| Researchers | 4 |
| Practitioners | 2 |
| Community | 1 |
| Support Staff | 1 |
Location
| China | 4 |
| Turkey | 3 |
| Australia | 2 |
| Canada | 2 |
| Ireland | 2 |
| Netherlands | 2 |
| Singapore | 2 |
| United Kingdom | 2 |
| Alabama | 1 |
| California | 1 |
| Germany | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Job Training Partnership Act… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedLivingston, Samuel A.; Lewis, Charles – Journal of Educational Measurement, 1995
A method is presented for estimating the accuracy and consistency of classifications based on test scores. The reliability of the score is used to estimate effective test length in terms of discrete items. The true-score distribution is estimated by fitting a four-parameter beta model. (SLD)
Descriptors: Classification, Estimation (Mathematics), Scores, Statistical Distributions
Peer reviewedQualls, Audrey L. – Applied Measurement in Education, 1995
Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)
Descriptors: Achievement Tests, Estimation (Mathematics), Measurement Techniques, Test Format
Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007
Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…
Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models
Peer reviewedDonders, Jacques – Psychological Assessment, 1997
Eight subtests were selected from the Wechsler Intelligence Scale for Children--Third Edition (WISC-III) to make a short form for clinical use. Results with the 2,200 children from the WISC-III standardization sample indicated the adequate reliability and validity of the short form for clinical use. (SLD)
Descriptors: Children, Clinical Diagnosis, Intelligence Tests, Test Format
Peer reviewedAxelrod, Bradley N.; And Others – Psychological Assessment, 1996
The calculations of D. Schretlen, R. H. B. Benedict, and J. H. Bobholz for the reliabilities of a short form of the Wechsler Adult Intelligence Scale--Revised (WAIS-R) (1994) consistently overestimated the values. More accurate values are provided for the WAIS--R and a seven-subtest short form. (SLD)
Descriptors: Error Correction, Error of Measurement, Estimation (Mathematics), Intelligence Tests
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2006
Many academic tests (e.g. short-answer and multiple-choice) sample required knowledge with questions scoring 0 or 1 (dichotomous scoring). Few textbooks give useful guidance on the length of test needed to do this reliably. Posey's binomial error model of 1932 provides the best starting point, but allows neither for heterogeneity of question…
Descriptors: Item Sampling, Tests, Test Length, Test Reliability
Peer reviewedRowley, Glenn – Journal of Educational Measurement, 1978
The reliabilities of various observational measures were determined, and the influence of both the number and the length of the observation periods on reliability was examined, both separately and jointly. A single simplifying assumption leads to a variant of the Spearman-Brown formula, which may have wider application. (Author/CTM)
Descriptors: Career Development, Classroom Observation Techniques, Observation, Reliability
Peer reviewedNelson, W. M., III; And Others – Journal of Personality Assessment, 1978
This study used 126 young adult black and white male inmates to test the comparability of the Pauker and Statz and Mogul short forms with the standard Wechsler Adult Intelligence Scale (WAIS). The Pauker form was superior with this population. Findings should not be generalized to other ages, races, or to women. (Author/CP)
Descriptors: Intelligence, Intelligence Differences, Intelligence Tests, Males
Peer reviewedBudescu, David – Journal of Educational Measurement, 1985
An important determinant of equating process efficiency is the correlation between the anchor test and components of each form. Use of some monotonic function of this correlation as a measure of equating efficiency is suggested. A model relating anchor test length and test reliability to this measure of efficiency is presented. (Author/DWH)
Descriptors: Correlation, Equated Scores, Mathematical Models, Standardized Tests
Peer reviewedCliff, Norman; And Others – Applied Psychological Measurement, 1979
Monte Carlo research with TAILOR, a program using implied orders as a basis for tailored testing, is reported. TAILOR typically required about half the available items to estimate, for each simulated examinee, the responses on the remainder. (Author/CTM)
Descriptors: Adaptive Testing, Computer Programs, Item Sampling, Nonparametric Statistics
Peer reviewedMeijer, Rob R.; And Others – Applied Psychological Measurement, 1994
The power of the nonparametric person-fit statistic, U3, is investigated through simulations as a function of item characteristics, test characteristics, person characteristics, and the group to which examinees belong. Results suggest conditions under which relatively short tests can be used for person-fit analysis. (SLD)
Descriptors: Difficulty Level, Group Membership, Item Response Theory, Nonparametric Statistics
Tailor-APL: An Interactive Computer Program for Individual Tailored Testing. Technical Report No. 5.
McCormick, Douglas J. – 1978
Tailored testing increases the efficiency of tests by individually selecting for each person a set of items from an item pool so that the difficulty of the items selected will be such as to maximize the information provided by the score. The tailored testing procedure designed by Cliff orders persons and items on a common ordinal scale and…
Descriptors: Adaptive Testing, Branching, Computer Assisted Testing, Computer Programs
Peer reviewedMunson, J. Michael; McQuarrie, Edward F. – Educational and Psychological Measurement, 1987
A shortened version of Zaichkowsky's 20-item Personal Involvement Inventory was created, removing four items which might be difficult to understand for noncollege-educated populations. The 16-item modified version had acceptable internal consistency; test-retest reliability; and factorial and predictive validity. (Author/GDC)
Descriptors: Factor Structure, Higher Education, Interest Inventories, Personality Measures
Peer reviewedWillson, Victor L.; Reynold, Cecil R. – Educational and Psychological Measurement, 1985
Techniques for constructing short forms of tests are discussed, and an example is given using the Wechsler Adult Intelligence Scale-Revised. Reliability and validity estimation equations are presented. (GDC)
Descriptors: Adults, Individual Testing, Intelligence Tests, Norm Referenced Tests
Peer reviewedKipps, Debi; Hanson, Dave – School Psychology Review, 1983
The Peabody Picture Vocabulary Test-Revised (Dunn and Dunn) is described as a convenient, quick test, possessing improvements over the original. It measures a subject's receptive (hearing) vocabulary for Standard American English. However, the validity information for the test is less than adequate, since no validity studies are presented for it.…
Descriptors: Auditory Tests, Individual Testing, Scores, Test Length

Direct link
