ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	8

Descriptor

Comparative Analysis	16
Interrater Reliability	16
Test Items	16
Difficulty Level	6
Scoring	6
Judges	4
Mathematics Tests	4
Language Tests	3
Standard Setting (Scoring)	3
Test Construction	3
Test Reliability	3
Testing	3
Alternative Assessment	2
Computer Assisted Testing	2
Correlation	2
Cutting Scores	2
Educational Quality	2
English (Second Language)	2
Equated Scores	2
Evaluation Methods	2
Higher Education	2
Mathematics Education	2
Multiple Choice Tests	2
Rating Scales	2
Scores	2
More ▼

Source

ETS Research Report Series	2
Applied Measurement in…	1
Cambridge Assessment	1
Center for Research on…	1
Education Digest: Essential…	1
Educational Studies in…	1
Journal of Communication…	1
Journal of Early Childhood…	1
New Directions for Teaching…	1
Online Submission	1

Publication Type

Reports - Research	8
Journal Articles	7
Speeches/Meeting Papers	7
Reports - Evaluative	5
Reports - Descriptive	2
Collected Works - Serials	1
Information Analyses	1
Tests/Questionnaires	1

Education Level

Higher Education	2
Secondary Education	2
Early Childhood Education	1
Elementary Secondary Education	1
High Schools	1
Postsecondary Education	1

Audience

Researchers

Location

Colorado	1
Florida	1
Georgia	1
Nevada	1
North Carolina	1
Pennsylvania	1
Tennessee	1
Vermont	1
Washington	1

Laws, Policies, & Programs

Assessments and Surveys

Early Childhood Environment…	1
National Assessment of…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

A Comparative Analysis of the "Early Childhood Environment Rating Scale--Revised" and "Early Childhood Environment Rating Scale, Third Edition"

Peer reviewed
PDF on ERIC

Download full text

Direct link

Neitzel, Jennifer; Early, Diane; Sideris, John; LaForrett, Doré; Abel, Michael B.; Soli, Margaret; Davidson, Dawn L.; Haboush-Deloye, Amanda; Hestenes, Linda L.; Jenson, Denise; Johnson, Cindy; Kalas, Jennifer; Mamrak, Angela; Masterson, Marie L.; Mims, Sharon U.; Oya, Patti; Philson, Bobbi; Showalter, Megan; Warner-Richter, Mallory; Kortright Wood, Jill – Journal of Early Childhood Research, 2019

The Early Childhood Environment Rating Scales, including the "Early Childhood Environment Rating Scale--Revised" (Harms et al., 2005) and the "Early Childhood Environment Rating Scale, Third Edition" (Harms et al., 2015) are the most widely used observational assessments in early childhood learning environments. The most recent…

Descriptors: Rating Scales, Early Childhood Education, Educational Quality, Scoring

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Estimating Item Difficulty with Comparative Judgments. Research Report. ETS RR-14-39

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Saldivia, Luis; Jackson, Carol; Schuppan, Fred; Wanamaker, Wilbur – ETS Research Report Series, 2014

Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for themost part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of…

Descriptors: Test Items, Difficulty Level, Comparative Analysis, College Entrance Examinations

The Problem of Assessing Problem Solving: Can Comparative Judgement Help?

Peer reviewed

Direct link

Jones, Ian; Inglis, Matthew – Educational Studies in Mathematics, 2015

School mathematics examination papers are typically dominated by short, structured items that fail to assess sustained reasoning or problem solving. A contributory factor to this situation is the need for student work to be marked reliably by a large number of markers of varied experience and competence. We report a study that tested an…

Descriptors: Problem Solving, Mathematics Instruction, Mathematics Tests, Test Items

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Quality of Questions on Common Tests at Issue

Direct link

Sawchuk, Stephen – Education Digest: Essential Readings Condensed for Quick Review, 2010

Most experts in the testing community have presumed that the $350 million promised by the U.S. Department of Education to support common assessments would promote those that made greater use of open-ended items capable of measuring higher-order critical-thinking skills. But as measurement experts consider the multitude of possibilities for an…

Descriptors: Educational Quality, Test Items, Comparative Analysis, Multiple Choice Tests

The Essential Role of Curricular Analyses in Comparative Studies of Mathematics Achievement: Developing "Fair" Tests

Download full text

Chavez, Oscar; Papick, Ira; Ross, Dan J.; Grouws, Douglas A. – Online Submission, 2010

The purpose of this paper was to describe the process of development of assessment instruments for the Comparing Options in Secondary Mathematics: Investigating Curriculum (COSMIC) project. The COSMIC project was a three-year longitudinal comparative study focusing on evaluating high school students' mathematics learning from two distinct…

Descriptors: Mathematics Education, Mathematics Achievement, Interrater Reliability, Scoring Rubrics

Sampling of Common Items: An Unrecognized Source of Error in Test Equating. CSE Report 636

Download full text

Michaelides, Michalis P.; Haertel, Edward H. – Center for Research on Evaluation Standards and Student Testing CRESST, 2004

There is variability in the estimation of an equating transformation because common-item parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a…

Descriptors: Test Items, Testing, Error Patterns, Interrater Reliability

Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.

Download full text

O'Neill, Thomas R.; Lunz, Mary E. – 1996

To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…

Descriptors: Ability, Benchmarking, Comparative Analysis, Difficulty Level

Language Testing: Recent Developments and Persistent Dilemmas.

Download full text

Takala, Sauli – 1998

This paper discusses recent developments in language testing. It begins with a review of the traditional criteria that are applied to all measurement and outlines recent emphases that derive from the expanding range of stakeholders. Drawing on Alderson's seminal work, criteria are presented for evaluating communicative language tests. Developments…

Descriptors: Alternative Assessment, Communicative Competence (Languages), Comparative Analysis, Evaluation Criteria

A Comparison of the Paper Selection Method and the Contrasting Groups Method for Setting Standards on Constructed-Response Items.

Download full text

Webb, Melvin W., II; Miller, Eva R. – 1995

As constructed-response items become an integral part of educational assessments, setting student performance standards on constructed-response items has become an important issue. Two standard-setting methods, one used for setting standards on the National Assessment of Educational Progress (NAEP) in reading in grade 8 and the other used to set…

Descriptors: Comparative Analysis, Constructed Response, Criteria, Educational Assessment

A Comparison between the Nedelsky and Angoff Standard-Setting Methods.

Download full text

Chang, Lei – 1996

It was hypothesized that, when compared to the Angoff method (W. H. Angoff, 1971), the Nedelsky method (L. Nedelsky, 1954) for standard setting had lower intrajudge inconsistency, lower cutscores, and lower cutscores especially for items presenting challenges to the judges. These hypotheses were tested and supported in a sample of 22 graduate…

Descriptors: Comparative Analysis, Cutting Scores, Difficulty Level, Distractors (Tests)

Psychometric Properties of Student Ratings of Instruction in Online and On-Campus Courses

Peer reviewed

Direct link

McGhee, Debbie E.; Lowell, Nana – New Directions for Teaching and Learning, 2003

This study compares mean ratings, inter-rater reliabilities, and the factor structure of items for online and paper student-rating forms from the University of Washington's Instructional Assessment System. (Contains 3 figures and 2 tables.)

Descriptors: Psychometrics, Factor Structure, Student Evaluation of Teacher Performance, Test Items

Involving Factors of Fairness in Language Testing.

Download full text

Nakamura, Yuji – Journal of Communication Studies, 1997

This study investigated the effects of three aspects of language testing (test task, familiarity with an interviewer, and test method) on both tester and tested. Data were drawn from several previous studies by the researcher. Concerning test task, data were analyzed for the type of topic students wanted most to talk about or preferred not to talk…

Descriptors: Behavior Patterns, Comparative Analysis, English (Second Language), Interrater Reliability

Previous Page | Next Page »

Pages: 1 | 2

Abel, Michael B.	1
Attali, Yigal	1
Benton, Tom	1
Boyer, Michelle	1
Breyer, F. Jay	1
Chang, Lei	1
Chavez, Oscar	1
Davidson, Dawn L.	1
Early, Diane	1
Garrido, Mariquita	1
Grouws, Douglas A.	1
Haboush-Deloye, Amanda	1
Haertel, Edward H.	1
Hestenes, Linda L.	1
Hughes, Sarah	1
Inglis, Matthew	1
Jackson, Carol	1
Jenson, Denise	1
Johnson, Cindy	1
Jones, Ian	1
Kalas, Jennifer	1
Kieftenbeld, Vincent	1
Kortright Wood, Jill	1
LaForrett, Doré	1
Leech, Tony	1
More ▼