ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	13

Descriptor

Test Items	78
Testing Problems	78
Test Construction	35
Computer Assisted Testing	20
Adaptive Testing	17
Test Bias	15
Elementary Secondary Education	14
Mathematical Models	14
Multiple Choice Tests	14
Estimation (Mathematics)	13
Foreign Countries	12
Item Response Theory	12
Test Format	12
Item Analysis	11
Simulation	11
Test Validity	11
Scoring	10
Higher Education	9
Achievement Tests	8
Difficulty Level	8
Statistical Analysis	8
Educational Assessment	7
Evaluation Methods	7
Item Banks	7
Latent Trait Theory	7
More ▼

Publication Type

Reports - Evaluative	78
Journal Articles	32
Speeches/Meeting Papers	23
Opinion Papers	3
Reports - Research	3
Guides - Classroom - Teacher	2
Tests/Questionnaires	2
Books	1
Collected Works - General	1
Guides - Non-Classroom	1
Information Analyses	1
Numerical/Quantitative Data	1
More ▼

Education Level

Elementary Secondary Education	4
Secondary Education	3
Adult Education	1
Higher Education	1
Postsecondary Education	1

Audience

Researchers	3
Practitioners	2
Teachers	1

Location

Netherlands	5
Colombia	1
Germany	1
Japan	1
Latin America	1
Massachusetts	1
Russia	1
Texas	1
United Kingdom	1
United Kingdom (Great Britain)	1
United States	1
More ▼

Laws, Policies, & Programs

Education for All Handicapped…	1
Individuals with Disabilities…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

SAT (College Admission Test)	3
Program for International…	2
Advanced Placement…	1
Armed Services Vocational…	1
Expressive One Word Picture…	1
Graduate Management Admission…	1
Graduate Record Examinations	1
Massachusetts Comprehensive…	1
National Assessment of…	1
Slosson Intelligence Test	1
Wechsler Adult Intelligence…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 78 results Save | Export

Reporting Pass-Fail Decisions to Examinees with Incomplete Data: A Commentary on Feinberg (2021)

Peer reviewed

Direct link

Sinharay, Sandip – Educational Measurement: Issues and Practice, 2022

Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores, and hence to incomplete data, on credentialing tests such as the United States Medical Licensing examination. Feinberg compared four approaches for reporting pass-fail decisions to the examinees with incomplete data on credentialing…

Descriptors: Testing Problems, High Stakes Tests, Credentials, Test Items

Better Remedies for Bad Exams: Correcting for Difficult Questions in a Fair and Systematic Way

Peer reviewed
PDF on ERIC

Download full text

Camenares, Devin – International Journal for the Scholarship of Teaching and Learning, 2022

Balancing assessment of learning outcomes with the expectations of students is a perennial challenge in education. Difficult exams, in which many students perform poorly, exacerbate this problem and can inspire a wide variety of interventions, such as a grading curve. However, addressing poor performance can sometimes distort or inflate grades and…

Descriptors: College Students, Student Evaluation, Tests, Test Items

Low Stakes, High Risks: The Problem of Intertemporal Validity of PISA in Latin America

Peer reviewed

Direct link

Rivas, Axel; Scasso, Martín Guillermo – Journal of Education Policy, 2021

Since 2000, the PISA test implemented by OECD has become the prime benchmark for international comparisons in education. The 2015 PISA edition introduced methodological changes that altered the nature of its results. PISA made no longer valid non-reached items of the final part of the test, assuming that those unanswered questions were more a…

Descriptors: Test Validity, Computer Assisted Testing, Foreign Countries, Achievement Tests

Designing Language Assessments in Context: Theoretical, Technical, and Institutional Considerations

Peer reviewed
PDF on ERIC

Download full text

Giraldo, Frank – HOW, 2019

The purpose of this article of reflection is to raise awareness of how poor design of language assessments may have detrimental effects, if crucial qualities and technicalities of test design are not met. The article first discusses these central qualities for useful language assessments. Then, guidelines for creating listening assessments, as an…

Descriptors: Test Construction, Consciousness Raising, Language Tests, Second Language Learning

Challenges and Strategies for Assessing Specialised Knowledge for Teaching

Peer reviewed
PDF on ERIC

Download full text

Orrill, Chandra Hawley; Kim, Ok-Kyeong; Peters, Susan A.; Lischka, Alyson E.; Jong, Cindy; Sanchez, Wendy B.; Eli, Jennifer A. – Mathematics Teacher Education and Development, 2015

Developing and writing assessment items that measure teachers' knowledge is an intricate and complex undertaking. In this paper, we begin with an overview of what is known about measuring teacher knowledge. We then highlight the challenges inherent in creating assessment items that focus specifically on measuring teachers' specialised knowledge…

Descriptors: Specialization, Knowledge Base for Teaching, Educational Strategies, Testing Problems

How PARCC's False Rigor Stunts the Academic Growth of All Students. White Paper No. 135

Download full text

McQuillan, Mark; Phelps, Richard P.; Stotsky, Sandra – Pioneer Institute for Public Policy Research, 2015

In July 2010, the Massachusetts Board of Elementary and Secondary Education (BESE) voted to adopt Common Core's standards in English language arts (ELA) and mathematics in place of the state's own standards in these two subjects. The vote was based largely on recommendations by Commissioner of Education Mitchell Chester and then Secretary of…

Descriptors: Reading Tests, Writing Tests, Achievement Tests, Common Core State Standards

Language Effects in International Testing: The Case of PISA 2006 Science Items

Peer reviewed

Direct link

El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art – Assessment in Education: Principles, Policy & Practice, 2016

We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…

Descriptors: International Assessment, Difficulty Level, Test Items, Language Variation

The Russian Uniform State Examination in Mathematics: The Latest Version

Peer reviewed

Direct link

Marushina, Albina – Journal of Mathematics Education at Teachers College, 2012

This paper aims to tell how the Russian national examination in mathematics (the Uniform State Examination or USE) has been conducted most recently. The author must say at once that the history of the system of secondary school graduation examinations or even the history of the USE will be covered only to the small degree that is necessary for…

Descriptors: Foreign Countries, Mathematics Tests, National Competency Tests, Secondary School Mathematics

Limits on the Accuracy of Linking. Research Report. ETS RR-10-22

Download full text

Haberman, Shelby J. – Educational Testing Service, 2010

Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…

Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy

Ongoing Issues in Test Fairness

Peer reviewed

Direct link

Camilli, Gregory – Educational Research and Evaluation, 2013

In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…

Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

Establishing Criteria for Meritorious Test Items.

Peer reviewed

Osterlind, Steven J. – Educational Research Quarterly, 1990

Criteria for planning, designing, and writing test items are suggested. The criteria were developed via a discussion by subject matter specialists, psychometricians, and test construction experts. Seven criteria proposed for test items of merit address the congruence of an item with its intended purpose, technical assumptions, and editorial…

Descriptors: Criteria, Guidelines, Test Construction, Test Items

Detecting Differential Speededness in Multistage Testing

Peer reviewed

Direct link

van der Linden, Wim J.; Breithaupt, Krista; Chuah, Siang Chee; Zhang, Yanwei – Journal of Educational Measurement, 2007

A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed…

Descriptors: Adaptive Testing, Evaluation Methods, Test Items, Reaction Time

Detecting and Interpreting Local Item Dependence Using a Family of Rasch Models.

Peer reviewed

Wilson, Mark – Applied Psychological Measurement, 1988

A method for detecting and interpreting disturbances of the local-independence assumption among items that share common stimulus material or other features is presented. Dichotomous and polytomous Rasch models are used to analyze structure of the learning outcome superitems. (SLD)

Descriptors: Item Analysis, Latent Trait Theory, Mathematical Models, Test Interpretation

Review for Perceived Bias on ASVAB Forms 11, 12, and 13.

Download full text

Boldt, Robert F. – 1983

The project reported here consisted of a sensitivity review of the items of Forms 11, 12, and 13 of the Armed Services Vocational Aptitude Battery (ASVAB). Because administration of this battery is a required step in the accession process, it should be free from perceived bias or offensiveness that could detract from the measurement process. In…

Descriptors: Aptitude Tests, Attitudes, Military Personnel, Opinions

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Journal of Educational…	6
Applied Measurement in…	2
Applied Psychological…	2
Educational Measurement:…	2
Assessment in Education:…	1
Economics	1
Educational Assessment	1
Educational Evaluation and…	1
Educational Research Quarterly	1
Educational Research and…	1
Educational Testing Service	1
Educational and Psychological…	1
Evaluation Review	1
Exceptional Children	1
HOW	1
International Journal for the…	1
Journal of Education Policy	1
Journal of Mathematics…	1
Journal of Research and…	1
Learning Disabilities…	1
Mathematics Teacher Education…	1
Online Submission	1
Pioneer Institute for Public…	1
Psychometrika	1
Review of Research in…	1
More ▼

Stocking, Martha L.	6
Hambleton, Ronald K.	4
Davey, Tim	3
Kelderman, Henk	3
Mills, Craig N.	3
Lewis, Charles	2
Parshall, Cynthia G.	2
Wainer, Howard	2
Wilson, Mark	2
Wise, Steven L.	2
Algina, James	1
Altepeter, Tom	1
Baird, Jo-Anne	1
Bar-Hillel, Maya	1
Bayless, David L.	1
Bennett, Randy Elliot	1
Boekkooi-Timminga, Ellen	1
Boldt, Robert F.	1
Breithaupt, Krista	1
Budescu, David	1
Burstein, Leigh	1
Camenares, Devin	1
Camilli, Gregory	1
Carlson, Sybil B.	1
More ▼