Publication Date
| In 2026 | 0 |
| Since 2025 | 62 |
| Since 2022 (last 5 years) | 388 |
| Since 2017 (last 10 years) | 831 |
| Since 2007 (last 20 years) | 1345 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 195 |
| Teachers | 161 |
| Researchers | 93 |
| Administrators | 50 |
| Students | 34 |
| Policymakers | 15 |
| Parents | 12 |
| Counselors | 2 |
| Community | 1 |
| Media Staff | 1 |
| Support Staff | 1 |
| More ▼ | |
Location
| Canada | 63 |
| Turkey | 59 |
| Germany | 41 |
| United Kingdom | 37 |
| Australia | 36 |
| Japan | 35 |
| China | 33 |
| United States | 32 |
| California | 25 |
| Iran | 25 |
| United Kingdom (England) | 25 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedEllison, Stephanie; Fisher, Anne G.; Duran, Leslie – Journal of Applied Measurement, 2001
Evaluated the alternate forms reliability of new versus old tasks of the Assessment of Motor and Process Skills (AMPS) (A. Fisher, 1993). Participants were 44 persons from the AMPS database. Results support good alternate forms reliability of the motor and process ability measures and suggest that the newly calibrated tasks can be used reliably in…
Descriptors: Adults, Evaluation Methods, Psychomotor Skills, Reliability
Grant, S. G.; Gradwell, Jill M.; Cimbricz, Sandra K. – Journal of Curriculum and Supervision, 2004
In this article we consider the extent to which the Document-Based Question (DBQ) on the New York State Global History and Geography exam represents an authentic task. The DBQ seems like a significant step toward authenticity, especially when compared with traditional forced-choice assessments. Drawing on the characteristics of authentic tasks as…
Descriptors: Measurement Techniques, History, Tests, Test Format
Xu, Zeyu; Nichols, Austin – National Center for Analysis of Longitudinal Data in Education Research, 2010
The gold standard in making causal inference on program effects is a randomized trial. Most randomization designs in education randomize classrooms or schools rather than individual students. Such "clustered randomization" designs have one principal drawback: They tend to have limited statistical power or precision. This study aims to…
Descriptors: Test Format, Reading Tests, Norm Referenced Tests, Research Design
Crisp, Victoria – Research Papers in Education, 2008
This research set out to compare the quality, length and nature of (1) exam responses in combined question and answer booklets, with (2) responses in separate answer booklets in order to inform choices about response format. Combined booklets are thought to support candidates by giving more information on what is expected of them. Anecdotal…
Descriptors: Geography Instruction, High School Students, Test Format, Test Construction
Pommerich, Mary – Journal of Technology, Learning, and Assessment, 2007
Computer administered tests are becoming increasingly prevalent as computer technology becomes more readily available on a large scale. For testing programs that utilize both computer and paper administrations, mode effects are problematic in that they can result in examinee scores that are artificially inflated or deflated. As such, researchers…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Scores
Yi, Hyun Sook; Kim, Seonghoon; Brennan, Robert L. – Applied Psychological Measurement, 2007
Large-scale testing programs involving classification decisions typically have multiple forms available and conduct equating to ensure cut-score comparability across forms. A test developer might be interested in the extent to which an examinee who happens to take a particular form would have a consistent classification decision if he or she had…
Descriptors: Classification, Reliability, Indexes, Computation
Peer reviewedSilverstein, A. B. – Journal of Clinical Psychology, 1985
The findings of research on short forms of the Wechsler Adult Intelligence Scales-Revised are used to illustrate points about three criteria for evaluating the usefulness of a short form. Results indicate there is little justification for regarding the three criteria as criteria. (Author/BL)
Descriptors: Correlation, Evaluation Criteria, Test Format, Test Interpretation
Peer reviewedJohnson, William L.; Dixon, Paul N. – Educational and Psychological Measurement, 1984
This study analyzed the results of applying two different methods of likert-scale construction (single-column and discrepancy-column formats). The findings indicated that the discrepancy format provides stronger discrimination for purposes of measuring need. (Author/BW)
Descriptors: Needs Assessment, Responses, Test Construction, Test Format
Peer reviewedBolter, John F.; And Others – Journal of Consulting and Clinical Psychology, 1984
Contends that the Speech Sounds Perception Test form (Adult and Midrange versions) is structured such that correct responses can be determined rationally. If a patient identifies and responds according to that structure, the validity of the test is compromised. Posttest interview is suggested as a simple solution. (Author/JAC)
Descriptors: Response Style (Tests), Test Format, Test Validity, Testing Problems
Illinois State Board of Education, 2004
The Illinois State Board of Education (ISBE) provides this booklet to help in preparing for the Prairie State Achievement Examination (PSAE). Part I of this booklet is an overview that answers some basic questions about the PSAE: What is it? What will it cover? When will it be given? Part II is a preparation guide for the five tests that are…
Descriptors: State Standards, Test Content, Test Format, Achievement Tests
Peer reviewedLucas, Peter A.; McConkie, George W. – American Educational Research Journal, 1980
An approach is described for the characterization of test questions in terms of the information in a passage relevant to answering them and the nature of the relationship of this information to the questions. The approach offers several advantages over previous algorithms for the production of test items. (Author/GDC)
Descriptors: Content Analysis, Cues, Test Construction, Test Format
Peer reviewedPlake, Barbara S. – Journal of Experimental Education, 1980
Three-item orderings and two levels of knowledge of ordering were used to study differences in test results, student's perception of the test's fairness and difficulty, and student's estimation of test performance. No significant order effect was found. (Author/GK)
Descriptors: Difficulty Level, Higher Education, Scores, Test Format
Peer reviewedHanson, Bradley A. – Applied Measurement in Education, 1996
Determining whether score distributions differ on two or more test forms administered to samples of examinees from a single population is explored using three statistical tests using loglinear models. Examples are presented of applying tests of distribution differences to decide if equating is needed for alternative forms of a test. (SLD)
Descriptors: Equated Scores, Scoring, Statistical Distributions, Test Format
Peer reviewedFeldt, Leonard S. – Applied Measurement in Education, 2002
Considers the degree of bias in testlet-based alpha (internal consistency reliability) through hypothetical examples and real test data from four tests of the Iowa Tests of Basic Skills. Presents a simple formula for computing a testlet-based congeneric coefficient. (SLD)
Descriptors: Estimation (Mathematics), Reliability, Statistical Bias, Test Format
Peer reviewedFernald, Peter S.; Webster, Sandra – Journal of Humanistic Education and Development, 1991
Conducted two studies on take-home, closed-book examination. First study involved 20 college students and was designed to provide categorical, comprehensive outline of students' assessments of take-home, closed-book procedure. Second study involved 23 students and compared amount of learning achieved on in-class examination with that on take-home,…
Descriptors: College Students, Higher Education, Student Attitudes, Test Format

Direct link
