ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	5

Descriptor

Test Construction	14
Test Format	14
Test Items	12
Multiple Choice Tests	5
Item Response Theory	3
Objective Tests	3
Test Content	3
Classification	2
Elementary Secondary Education	2
Evaluation Methods	2
High School Students	2
Item Analysis	2
Licensing Examinations…	2
Scores	2
Standardized Tests	2
Validity	2
Academic Ability	1
Algebra	1
Architecture	1
Automation	1
Bilingualism	1
Classroom Techniques	1
Cognitive Ability	1
Cognitive Psychology	1
College Entrance Examinations	1
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	14
Reports - Research	8
Reports - Evaluative	5
Information Analyses	2

Education Level

Higher Education	2
Postsecondary Education	2
High Schools	1

Audience

Location

Canada	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Using Think-Alouds for Response Process Evidence of Teacher Attentiveness

Peer reviewed

Direct link

Mo, Ya; Carney, Michele; Cavey, Laurie; Totorica, Tatia – Applied Measurement in Education, 2021

There is a need for assessment items that assess complex constructs but can also be efficiently scored for evaluation of teacher education programs. In an effort to measure the construct of teacher attentiveness in an efficient and scalable manner, we are using exemplar responses elicited by constructed-response item prompts to develop…

Descriptors: Protocol Analysis, Test Items, Responses, Mathematics Teachers

Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

Peer reviewed

Direct link

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André – Applied Measurement in Education, 2016

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…

Descriptors: Psychometrics, Multiple Choice Tests, Test Items, Item Analysis

The Effect of Changing Content on IRT Scaling Methods

Peer reviewed

Direct link

Keller, Lisa A.; Keller, Robert R. – Applied Measurement in Education, 2015

Equating test forms is an essential activity in standardized testing, with increased importance with the accountability systems in existence through the mandate of Adequate Yearly Progress. It is through equating that scores from different test forms become comparable, which allows for the tracking of changes in the performance of students from…

Descriptors: Item Response Theory, Rating Scales, Standardized Tests, Scoring Rubrics

Creating IRT-Based Parallel Test Forms Using the Genetic Algorithm Method

Peer reviewed

Direct link

Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen – Applied Measurement in Education, 2008

In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…

Descriptors: Test Format, Measurement Techniques, Equations (Mathematics), Item Response Theory

Peer reviewed

Direct link

Ascalon, M. Evelina; Meyers, Lawrence S.; Davis, Bruce W.; Smits, Niels – Applied Measurement in Education, 2007

This article examined two item-writing guidelines: the format of the item stem and homogeneity of the answer set. Answering the call of Haladyna, Downing, and Rodriguez (2002) for empirical tests of item writing guidelines and extending the work of Smith and Smith (1988) on differential use of item characteristics, a mock multiple-choice driver's…

Descriptors: Guidelines, Difficulty Level, Standard Setting, Driver Education

Checking the Statistical Equivalence of Nearly Identical Test Editions.

Peer reviewed

Dorans, Neil J.; Lawrence, Ida M. – Applied Measurement in Education, 1990

A procedure for checking the score equivalence of nearly identical editions of a test is described and illustrated with Scholastic Aptitude Test data. The procedure uses the standard error of equating and uses graphical representation of score conversion deviations from the identity function in standard error units. (SLD)

Descriptors: Equated Scores, Grade Equivalent Scores, Scores, Statistical Analysis

Validity of a Taxonomy of Multiple-Choice Item-Writing Rules.

Peer reviewed

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989

Results of 96 theoretical/empirical studies were reviewed to see if they support a taxonomy of 43 rules for writing multiple-choice test items. The taxonomy is the result of an analysis of 46 textbooks dealing with multiple-choice item writing. For nearly half of the rules, no research was found. (SLD)

Descriptors: Classification, Literature Reviews, Multiple Choice Tests, Test Construction

A Taxonomy of Multiple-Choice Item-Writing Rules.

Peer reviewed

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989

A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)

Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests

Item Type and Cognitive Ability Measured: The Validity Evidence for Multiple True-False Items in Medical Specialty Certification.

Peer reviewed

Downing, Steven M.; And Others – Applied Measurement in Education, 1995

The criterion-related validity evidence and other psychometric characteristics of multiple-choice and multiple true-false (MTF) items in medical specialty certification examinations were compared using results from 21,346 candidates. Advantages of MTF items and implications for test construction are discussed. (SLD)

Descriptors: Cognitive Ability, Licensing Examinations (Professions), Medical Education, Objective Tests

Using Bilingual Respondents To Evaluate Translated-Adapted Items.

Peer reviewed

Sireci, Stephen G.; Berberoglu, Giray – Applied Measurement in Education, 2000

Studied a method for investigating the equivalence of translated-adapted items using bilingual test takers through item response theory. Results from an English-Turkish course evaluation form completed by 688 Turkish students indicate that the methodology is effective in flagging items that function differentially across languages and informing…

Descriptors: Bilingualism, College Students, Evaluation Methods, Higher Education

A Review of Automatically Scorable Constructed-Response Item Types for Large-Scale Assessment.

Peer reviewed

Martinez, Michael E.; Bennett, Randy Elliot – Applied Measurement in Education, 1992

New developments in the use of automatically scorable constructed response item types for large-scale assessment are reviewed for five domains: (1) mathematical reasoning; (2) algebra problem solving; (3) computer science; (4) architecture; and (5) natural language. Ways in which these technologies are likely to shape testing are considered. (SLD)

Descriptors: Algebra, Architecture, Automation, Computer Science

The Effectiveness of Several Multiple-Choice Formats.

Peer reviewed

Haladyna, Thomas A. – Applied Measurement in Education, 1992

Several multiple-choice item formats are examined in the current climate of test reform. The reform movement is discussed as it affects use of the following formats: (1) complex multiple-choice; (2) alternate choice; (3) true-false; (4) multiple true-false; and (5) the context dependent item set. (SLD)

Descriptors: Cognitive Psychology, Comparative Testing, Context Effect, Educational Change

Patterns of Gender Differences on Mathematics Items on the Scholastic Aptitude Test.

Peer reviewed

Harris, Abigail M.; Carlton, Sydell T. – Applied Measurement in Education, 1993

Differential item functioning on 6 forms of the Scholastic Aptitude Test was examined for 181,228 male and 198,668 female students focusing on the points tested, the test format, and subject matter in which items are embedded. Implications of the identifiable differences are discussed. (SLD)

Descriptors: College Entrance Examinations, Comparative Analysis, Females, High School Students

Test Use among Classroom Teachers and Its Relationship to Teaching Level and Teaching Practices.

Peer reviewed

Hall, Bruce W.; And Others – Applied Measurement in Education, 1988

Responses of 310 teachers in Florida to a survey about use of teacher-made tests, nationally standardized tests, and state minimum competency tests were studied. Results show that all three test types were used to some extent in eight decision categories, but none of the tests were clearly dominant. (SLD)

Descriptors: Classroom Techniques, Decision Making, Elementary Secondary Education, Minimum Competency Testing

Downing, Steven M.	3
Haladyna, Thomas M.	2
Ascalon, M. Evelina	1
Bennett, Randy Elliot	1
Berberoglu, Giray	1
Boulais, André-Philippe	1
Carlton, Sydell T.	1
Carney, Michele	1
Cavey, Laurie	1
Chen, Yu-Jen	1
Cheng, Chien-Fen	1
Davis, Bruce W.	1
De Champlain, André	1
Dorans, Neil J.	1
Gierl, Mark J.	1
Haladyna, Thomas A.	1
Hall, Bruce W.	1
Harris, Abigail M.	1
Keller, Lisa A.	1
Keller, Robert R.	1
Lai, Hollis	1
Lawrence, Ida M.	1
Martinez, Michael E.	1
Meyers, Lawrence S.	1
More ▼