ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	7

Descriptor

Educational Assessment	17
Test Items	17
Testing Problems	17
Test Construction	11
Elementary Secondary Education	7
Test Bias	6
Achievement Tests	5
Educational Testing	4
Foreign Countries	4
Item Response Theory	4
Multiple Choice Tests	4
Psychometrics	4
Student Evaluation	4
Accountability	3
Evaluation Methods	3
Evaluation Research	3
Goodness of Fit	3
National Surveys	3
Performance Based Assessment	3
Scoring	3
Standardized Tests	3
Testing Programs	3
Academic Achievement	2
Comparative Analysis	2
Correlation	2
More ▼

Source

Educational and Psychological…	2
ETS Research Report Series	1
Educational Technology &…	1
Journal of Educational…	1
Pioneer Institute for Public…	1
Review of Research in…	1

Publication Type

Reports - Evaluative	7
Reports - Research	7
Journal Articles	6
Speeches/Meeting Papers	4
Guides - Non-Classroom	2
Books	1
Collected Works - General	1
Information Analyses	1
Numerical/Quantitative Data	1

Education Level

Elementary Secondary Education	2
Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Practitioners	1
Researchers	1
Teachers	1

Location

Germany	1
Japan	1
Massachusetts	1
South Africa	1
United Kingdom	1
United States	1

Laws, Policies, & Programs

Individuals with Disabilities…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

National Assessment of…	4
Massachusetts Comprehensive…	1
Program for International…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

A Robust Method for Detecting Item Misfit in Large-Scale Assessments

Peer reviewed

Direct link

von Davier, Matthias; Bezirhan, Ummugul – Educational and Psychological Measurement, 2023

Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population…

Descriptors: Robustness (Statistics), Test Items, Item Analysis, Goodness of Fit

Comparing Data Treatments on Item-Level Nonresponse and Their Effects on Data Analysis of Large-Scale Assessments: 2009 PISA Study. Research Report. ETS RR-15-12

Peer reviewed
PDF on ERIC

Download full text

Chen, Haiwen H.; von Davier, Matthias; Yamamoto, Kentaro; Kong, Nan – ETS Research Report Series, 2015

One major issue with large-scale assessments is that the respondents might give no responses to many items, resulting in less accurate estimations of both assessed abilities and item parameters. This report studies how the types of items affect the item-level nonresponse rates and how different methods of treating item-level nonresponses have an…

Descriptors: Achievement Tests, Foreign Countries, International Assessment, Secondary School Students

How PARCC's False Rigor Stunts the Academic Growth of All Students. White Paper No. 135

Download full text

McQuillan, Mark; Phelps, Richard P.; Stotsky, Sandra – Pioneer Institute for Public Policy Research, 2015

In July 2010, the Massachusetts Board of Elementary and Secondary Education (BESE) voted to adopt Common Core's standards in English language arts (ELA) and mathematics in place of the state's own standards in these two subjects. The vote was based largely on recommendations by Commissioner of Education Mitchell Chester and then Secretary of…

Descriptors: Reading Tests, Writing Tests, Achievement Tests, Common Core State Standards

Impact of Missing Data on the Detection of Differential Item Functioning: The Case of Mantel-Haenszel and Logistic Regression Analysis

Peer reviewed

Direct link

Robitzsch, Alexander; Rupp, Andre A. – Educational and Psychological Measurement, 2009

This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…

Descriptors: Test Bias, Simulation, Interaction, Effect Size

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

Randomised Items in Computer-Based Tests: Russian Roulette in Assessment?

Peer reviewed

Direct link

Marks, Anthony M.; Cronje, Johannes C. – Educational Technology & Society, 2008

Computer-based assessments are becoming more commonplace, perhaps as a necessity for faculty to cope with large class sizes. These tests often occur in large computer testing venues in which test security may be compromised. In an attempt to limit the likelihood of cheating in such venues, randomised presentation of items is automatically…

Descriptors: Educational Assessment, Educational Testing, Research Needs, Test Items

An Examination of the Feasibility of Using Criterion-Referenced Measurement in Large-Scale, Survey Testing Situations.

Download full text

Graham, Darol L. – 1974

The adequacy of a test developed for statewide assessment of basic mathematics skills was investigated. The test, comprised of multiple-choice items reflecting a series of behavioral objectives, was compared with a more extensive criterion measure generated from the same objectives by the application of a strict item sampling model. In many…

Descriptors: Comparative Testing, Criterion Referenced Tests, Educational Assessment, Item Sampling

Bias: Psychometric and Social Implications for the National Assessment of Educational Progress.

Holmes, Barbara J. – 1980

In recent years, the controversy surrounding testing has grown, and the charge of bias is the most often cited criticism of testing and assessment. A review of the literature indicates that psychometricians and other researchers speak of bias as a property of the test or of items in the test. Conversely, test critics speak of bias as a quality or…

Descriptors: Educational Assessment, Educational Objectives, Federal Programs, National Surveys

The Prices of Secrecy: The Social, Intellectual, and Psychological Costs of Current Assessment Practice. A Report to the Ford Foundation.

Schwartz, Judah L., Ed.; Viator, Katherine A., Ed. – 1990

Problems in accountability assessment are examined from a unique perspective by considering the prices paid as a result of the use of secret tests (tests comprised of items drawn from non-publicly available item banks). This report is a compilation of the following articles: (1) "The Social, Intellectual, and Psychological Prices of…

Descriptors: Accountability, Achievement Tests, Confidentiality, Educational Assessment

Multiple-Rating Items.

Download full text

Scriven, Michael – 1991

An alternative to multiple-choice testing is suggested for educational assessment. The use of what is called "multiple-rating items" is proposed. A multiple-rating item calls for the examinee to rate all of a set of things instead of picking one as with a multiple-choice item. The respondent has to provide a specific rating of each…

Descriptors: Educational Assessment, Essay Tests, Higher Education, Multiple Choice Tests

What Counts as Evidence of Educational Achievement? The Role of Constructs in the Pursuit of Equity in Assessment

Peer reviewed

Direct link

Wiliam, Dylan – Review of Research in Education, 2010

The idea that validity should be considered a property of inferences, rather than of assessments, has developed slowly over the past century. In early writings about the validity of educational assessments, validity was defined as a property of an assessment. The most common definition was that an assessment was valid to the extent that it…

Descriptors: Educational Assessment, Validity, Inferences, Construct Validity

Feasibility Studies of Two-Stage Testing in Large-Scale Educational Assessment: Implications for NAEP. NAEP Validity Studies.

Download full text

Bock, R. Darrell; Zimowski, Michele F. – 1998

This report examines the potential of adaptive testing, two-stage testing in particular, for improving the data quality of the National Assessment of Educational Progress (NAEP). Following a discussion of the rationale for adaptive testing in assessment and a review of previous studies of two-stage testing, this report describes a 1993 Ohio field…

Descriptors: Adaptive Testing, Data Analysis, Educational Assessment, Elementary Secondary Education

Performance-Based Assessment for Accountability Purposes: Taking the Plunge and Assessing the Consequences.

Download full text

Burstein, Leigh – 1994

Issues in alternative assessment for accountability purposes are discussed. Most new forms of performance assessment are linked in the literature, but all alternative forms of assessment do not have the same attributes in terms of technical and feasibility criteria. Tradeoffs in the validity of inferences that can be drawn from alternative…

Descriptors: Accountability, Alternative Assessment, Costs, Educational Assessment

The National Assessment Approach to Objectives and Exercise Development.

Ward, Barbara – 1980

The National Assessment of Educational Progress (NAEP) item development procedures, possible improvements or alternatives to these procedures, and methods used to control potential sources of errors of interpretation are described. Current procedures call for the assessment of 9-, 13- and 17-year-olds in subject areas typically taught in schools.…

Descriptors: Achievement Tests, Behavioral Objectives, Educational Assessment, Elementary Secondary Education

Practical Questions about Item Response Models in Large-Scale Assessment Programs.

Download full text

Legg, Sue M.; Algina, James – 1986

This paper focuses on the questions which arise as test practitioners monitor score scales derived from latent trait theory. Large scale assessment programs are dynamic and constantly challenge the assumptions and limits of latent trait models. Even though testing programs evolve, test scores must remain reliable indicators of progress.…

Descriptors: Difficulty Level, Educational Assessment, Elementary Secondary Education, Equated Scores

Previous Page | Next Page »

Pages: 1 | 2

von Davier, Matthias	2
Algina, James	1
Bezirhan, Ummugul	1
Bock, R. Darrell	1
Burstein, Leigh	1
Chen, Haiwen H.	1
Cronje, Johannes C.	1
Cui, Ying	1
Graham, Darol L.	1
Gronlund, Norman E.	1
Holmes, Barbara J.	1
Kong, Nan	1
Legg, Sue M.	1
Leighton, Jacqueline P.	1
Marks, Anthony M.	1
McQuillan, Mark	1
Muratti, Jose E.	1
Phelps, Richard P.	1
Robitzsch, Alexander	1
Rupp, Andre A.	1
Schwartz, Judah L., Ed.	1
Scriven, Michael	1
Stotsky, Sandra	1
Viator, Katherine A., Ed.	1
More ▼