Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 4 |
| Since 2007 (last 20 years) | 13 |
Descriptor
Source
Author
| Stocking, Martha L. | 6 |
| Hambleton, Ronald K. | 4 |
| Davey, Tim | 3 |
| Kelderman, Henk | 3 |
| Mills, Craig N. | 3 |
| Lewis, Charles | 2 |
| Parshall, Cynthia G. | 2 |
| Wainer, Howard | 2 |
| Wilson, Mark | 2 |
| Wise, Steven L. | 2 |
| Algina, James | 1 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 4 |
| Secondary Education | 3 |
| Adult Education | 1 |
| Higher Education | 1 |
| Postsecondary Education | 1 |
Audience
| Researchers | 3 |
| Practitioners | 2 |
| Teachers | 1 |
Location
| Netherlands | 5 |
| Colombia | 1 |
| Germany | 1 |
| Japan | 1 |
| Latin America | 1 |
| Massachusetts | 1 |
| Russia | 1 |
| Texas | 1 |
| United Kingdom | 1 |
| United Kingdom (Great Britain) | 1 |
| United States | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Education for All Handicapped… | 1 |
| Individuals with Disabilities… | 1 |
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Sinharay, Sandip – Educational Measurement: Issues and Practice, 2022
Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores, and hence to incomplete data, on credentialing tests such as the United States Medical Licensing examination. Feinberg compared four approaches for reporting pass-fail decisions to the examinees with incomplete data on credentialing…
Descriptors: Testing Problems, High Stakes Tests, Credentials, Test Items
Camenares, Devin – International Journal for the Scholarship of Teaching and Learning, 2022
Balancing assessment of learning outcomes with the expectations of students is a perennial challenge in education. Difficult exams, in which many students perform poorly, exacerbate this problem and can inspire a wide variety of interventions, such as a grading curve. However, addressing poor performance can sometimes distort or inflate grades and…
Descriptors: College Students, Student Evaluation, Tests, Test Items
Rivas, Axel; Scasso, Martín Guillermo – Journal of Education Policy, 2021
Since 2000, the PISA test implemented by OECD has become the prime benchmark for international comparisons in education. The 2015 PISA edition introduced methodological changes that altered the nature of its results. PISA made no longer valid non-reached items of the final part of the test, assuming that those unanswered questions were more a…
Descriptors: Test Validity, Computer Assisted Testing, Foreign Countries, Achievement Tests
Giraldo, Frank – HOW, 2019
The purpose of this article of reflection is to raise awareness of how poor design of language assessments may have detrimental effects, if crucial qualities and technicalities of test design are not met. The article first discusses these central qualities for useful language assessments. Then, guidelines for creating listening assessments, as an…
Descriptors: Test Construction, Consciousness Raising, Language Tests, Second Language Learning
Orrill, Chandra Hawley; Kim, Ok-Kyeong; Peters, Susan A.; Lischka, Alyson E.; Jong, Cindy; Sanchez, Wendy B.; Eli, Jennifer A. – Mathematics Teacher Education and Development, 2015
Developing and writing assessment items that measure teachers' knowledge is an intricate and complex undertaking. In this paper, we begin with an overview of what is known about measuring teacher knowledge. We then highlight the challenges inherent in creating assessment items that focus specifically on measuring teachers' specialised knowledge…
Descriptors: Specialization, Knowledge Base for Teaching, Educational Strategies, Testing Problems
McQuillan, Mark; Phelps, Richard P.; Stotsky, Sandra – Pioneer Institute for Public Policy Research, 2015
In July 2010, the Massachusetts Board of Elementary and Secondary Education (BESE) voted to adopt Common Core's standards in English language arts (ELA) and mathematics in place of the state's own standards in these two subjects. The vote was based largely on recommendations by Commissioner of Education Mitchell Chester and then Secretary of…
Descriptors: Reading Tests, Writing Tests, Achievement Tests, Common Core State Standards
El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art – Assessment in Education: Principles, Policy & Practice, 2016
We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
Descriptors: International Assessment, Difficulty Level, Test Items, Language Variation
Marushina, Albina – Journal of Mathematics Education at Teachers College, 2012
This paper aims to tell how the Russian national examination in mathematics (the Uniform State Examination or USE) has been conducted most recently. The author must say at once that the history of the system of secondary school graduation examinations or even the history of the USE will be covered only to the small degree that is necessary for…
Descriptors: Foreign Countries, Mathematics Tests, National Competency Tests, Secondary School Mathematics
Haberman, Shelby J. – Educational Testing Service, 2010
Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…
Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology
Peer reviewedOsterlind, Steven J. – Educational Research Quarterly, 1990
Criteria for planning, designing, and writing test items are suggested. The criteria were developed via a discussion by subject matter specialists, psychometricians, and test construction experts. Seven criteria proposed for test items of merit address the congruence of an item with its intended purpose, technical assumptions, and editorial…
Descriptors: Criteria, Guidelines, Test Construction, Test Items
van der Linden, Wim J.; Breithaupt, Krista; Chuah, Siang Chee; Zhang, Yanwei – Journal of Educational Measurement, 2007
A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed…
Descriptors: Adaptive Testing, Evaluation Methods, Test Items, Reaction Time
Peer reviewedWilson, Mark – Applied Psychological Measurement, 1988
A method for detecting and interpreting disturbances of the local-independence assumption among items that share common stimulus material or other features is presented. Dichotomous and polytomous Rasch models are used to analyze structure of the learning outcome superitems. (SLD)
Descriptors: Item Analysis, Latent Trait Theory, Mathematical Models, Test Interpretation
Boldt, Robert F. – 1983
The project reported here consisted of a sensitivity review of the items of Forms 11, 12, and 13 of the Armed Services Vocational Aptitude Battery (ASVAB). Because administration of this battery is a required step in the accession process, it should be free from perceived bias or offensiveness that could detract from the measurement process. In…
Descriptors: Aptitude Tests, Attitudes, Military Personnel, Opinions

Direct link
