NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 17 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
von Davier, Matthias; Bezirhan, Ummugul – Educational and Psychological Measurement, 2023
Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population…
Descriptors: Robustness (Statistics), Test Items, Item Analysis, Goodness of Fit
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Chen, Haiwen H.; von Davier, Matthias; Yamamoto, Kentaro; Kong, Nan – ETS Research Report Series, 2015
One major issue with large-scale assessments is that the respondents might give no responses to many items, resulting in less accurate estimations of both assessed abilities and item parameters. This report studies how the types of items affect the item-level nonresponse rates and how different methods of treating item-level nonresponses have an…
Descriptors: Achievement Tests, Foreign Countries, International Assessment, Secondary School Students
McQuillan, Mark; Phelps, Richard P.; Stotsky, Sandra – Pioneer Institute for Public Policy Research, 2015
In July 2010, the Massachusetts Board of Elementary and Secondary Education (BESE) voted to adopt Common Core's standards in English language arts (ELA) and mathematics in place of the state's own standards in these two subjects. The vote was based largely on recommendations by Commissioner of Education Mitchell Chester and then Secretary of…
Descriptors: Reading Tests, Writing Tests, Achievement Tests, Common Core State Standards
Peer reviewed Peer reviewed
Direct linkDirect link
Robitzsch, Alexander; Rupp, Andre A. – Educational and Psychological Measurement, 2009
This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…
Descriptors: Test Bias, Simulation, Interaction, Effect Size
Peer reviewed Peer reviewed
Direct linkDirect link
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology
Peer reviewed Peer reviewed
Direct linkDirect link
Marks, Anthony M.; Cronje, Johannes C. – Educational Technology & Society, 2008
Computer-based assessments are becoming more commonplace, perhaps as a necessity for faculty to cope with large class sizes. These tests often occur in large computer testing venues in which test security may be compromised. In an attempt to limit the likelihood of cheating in such venues, randomised presentation of items is automatically…
Descriptors: Educational Assessment, Educational Testing, Research Needs, Test Items
Graham, Darol L. – 1974
The adequacy of a test developed for statewide assessment of basic mathematics skills was investigated. The test, comprised of multiple-choice items reflecting a series of behavioral objectives, was compared with a more extensive criterion measure generated from the same objectives by the application of a strict item sampling model. In many…
Descriptors: Comparative Testing, Criterion Referenced Tests, Educational Assessment, Item Sampling
Holmes, Barbara J. – 1980
In recent years, the controversy surrounding testing has grown, and the charge of bias is the most often cited criticism of testing and assessment. A review of the literature indicates that psychometricians and other researchers speak of bias as a property of the test or of items in the test. Conversely, test critics speak of bias as a quality or…
Descriptors: Educational Assessment, Educational Objectives, Federal Programs, National Surveys
Schwartz, Judah L., Ed.; Viator, Katherine A., Ed. – 1990
Problems in accountability assessment are examined from a unique perspective by considering the prices paid as a result of the use of secret tests (tests comprised of items drawn from non-publicly available item banks). This report is a compilation of the following articles: (1) "The Social, Intellectual, and Psychological Prices of…
Descriptors: Accountability, Achievement Tests, Confidentiality, Educational Assessment
Scriven, Michael – 1991
An alternative to multiple-choice testing is suggested for educational assessment. The use of what is called "multiple-rating items" is proposed. A multiple-rating item calls for the examinee to rate all of a set of things instead of picking one as with a multiple-choice item. The respondent has to provide a specific rating of each…
Descriptors: Educational Assessment, Essay Tests, Higher Education, Multiple Choice Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wiliam, Dylan – Review of Research in Education, 2010
The idea that validity should be considered a property of inferences, rather than of assessments, has developed slowly over the past century. In early writings about the validity of educational assessments, validity was defined as a property of an assessment. The most common definition was that an assessment was valid to the extent that it…
Descriptors: Educational Assessment, Validity, Inferences, Construct Validity
Bock, R. Darrell; Zimowski, Michele F. – 1998
This report examines the potential of adaptive testing, two-stage testing in particular, for improving the data quality of the National Assessment of Educational Progress (NAEP). Following a discussion of the rationale for adaptive testing in assessment and a review of previous studies of two-stage testing, this report describes a 1993 Ohio field…
Descriptors: Adaptive Testing, Data Analysis, Educational Assessment, Elementary Secondary Education
Burstein, Leigh – 1994
Issues in alternative assessment for accountability purposes are discussed. Most new forms of performance assessment are linked in the literature, but all alternative forms of assessment do not have the same attributes in terms of technical and feasibility criteria. Tradeoffs in the validity of inferences that can be drawn from alternative…
Descriptors: Accountability, Alternative Assessment, Costs, Educational Assessment
Ward, Barbara – 1980
The National Assessment of Educational Progress (NAEP) item development procedures, possible improvements or alternatives to these procedures, and methods used to control potential sources of errors of interpretation are described. Current procedures call for the assessment of 9-, 13- and 17-year-olds in subject areas typically taught in schools.…
Descriptors: Achievement Tests, Behavioral Objectives, Educational Assessment, Elementary Secondary Education
Legg, Sue M.; Algina, James – 1986
This paper focuses on the questions which arise as test practitioners monitor score scales derived from latent trait theory. Large scale assessment programs are dynamic and constantly challenge the assumptions and limits of latent trait models. Even though testing programs evolve, test scores must remain reliable indicators of progress.…
Descriptors: Difficulty Level, Educational Assessment, Elementary Secondary Education, Equated Scores
Previous Page | Next Page ยป
Pages: 1  |  2