ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	3

Source

Applied Measurement in…

Author

Becker, Douglas F.	1
Bennett, Randy Elliot	1
Boyer, Michelle	1
Brian E. Clauser	1
Davison, Mark L.	1
Dunham, Trudy C.	1
Frisbie, David A.	1
Glazer, Nancy	1
Hanson, Bradley A.	1
Janet Mee	1
Kieftenbeld, Vincent	1
Le An Ha	1
Martinez, Michael E.	1
Peter Baldwin	1
Victoria Yaneva	1
Wolfe, Edward W.	1
More ▼

Publication Type

Journal Articles	7
Reports - Research	4
Information Analyses	1
Reports - Descriptive	1
Reports - Evaluative	1

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Automated Scoring of Short-Answer Questions: A Progress Report

Peer reviewed

Direct link

Brian E. Clauser; Victoria Yaneva; Peter Baldwin; Le An Ha; Janet Mee – Applied Measurement in Education, 2024

Multiple-choice questions have become ubiquitous in educational measurement because the format allows for efficient and accurate scoring. Nonetheless, there remains continued interest in constructed-response formats. This interest has driven efforts to develop computer-based scoring procedures that can accurately and efficiently score these items.…

Descriptors: Computer Uses in Education, Artificial Intelligence, Scoring, Responses

Understanding and Interpreting Human Scoring

Peer reviewed

Direct link

Glazer, Nancy; Wolfe, Edward W. – Applied Measurement in Education, 2020

This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed…

Descriptors: Scoring, Test Format, Responses, Predictor Variables

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Testing for Differences in Test Score Distributions Using Loglinear Models.

Peer reviewed

Hanson, Bradley A. – Applied Measurement in Education, 1996

Determining whether score distributions differ on two or more test forms administered to samples of examinees from a single population is explored using three statistical tests using loglinear models. Examples are presented of applying tests of distribution differences to decide if equating is needed for alternative forms of a test. (SLD)

Descriptors: Equated Scores, Scoring, Statistical Distributions, Test Format

An Analysis of Textbook Advice about True-False Tests.

Peer reviewed

Frisbie, David A.; Becker, Douglas F. – Applied Measurement in Education, 1990

Seventeen educational measurement textbooks were reviewed to analyze current perceptions regarding true-false achievement testing. A synthesis of the rules for item writing is presented, and the purported advantages and disadvantages of the true-false format derived from those texts are reviewed. (TJH)

Descriptors: Achievement Tests, Higher Education, Methods Courses, Objective Tests

Effects of Scale Anchors on Student Ratings of Instructors.

Peer reviewed

Dunham, Trudy C.; Davison, Mark L. – Applied Measurement in Education, 1990

The effects of packing or skewing the response options of a scale on the common measurement problems of leniency and range restriction in instructor ratings were assessed. Results from a sample of 130 undergraduate education students indicate that packing reduced leniency but had no effect on range restriction. (TJH)

Descriptors: Education Majors, Higher Education, Professors, Rating Scales

A Review of Automatically Scorable Constructed-Response Item Types for Large-Scale Assessment.

Peer reviewed

Martinez, Michael E.; Bennett, Randy Elliot – Applied Measurement in Education, 1992

New developments in the use of automatically scorable constructed response item types for large-scale assessment are reviewed for five domains: (1) mathematical reasoning; (2) algebra problem solving; (3) computer science; (4) architecture; and (5) natural language. Ways in which these technologies are likely to shape testing are considered. (SLD)

Descriptors: Algebra, Architecture, Automation, Computer Science

Scoring	7
Test Format	7
Automation	3
Higher Education	2
Responses	2
Test Items	2
Testing Problems	2
Achievement Tests	1
Algebra	1
Architecture	1
Artificial Intelligence	1
Comparative Analysis	1
Competition	1
Computer Science	1
Computer Uses in Education	1
Constructed Response	1
Differences	1
Education Majors	1
Educational Technology	1
Equated Scores	1
Essay Tests	1
Evaluation Research	1
Evaluators	1
Interrater Reliability	1
Mathematics Skills	1
More ▼