ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	7

Source

New Mexico Public Education…	2
Center for Research on…	1
Education Digest: Essential…	1
Educational Measurement:…	1
International Journal of…	1
Journal of Educational and…	1
Measurement in Physical…	1
OECD Publishing (NJ1)	1
Thought Currents in English…	1

Publication Type

Reports - Descriptive	12
Journal Articles	6
Numerical/Quantitative Data	3
Tests/Questionnaires	1

Education Level

Elementary Secondary Education	3
Elementary Education	1
Higher Education	1
Postsecondary Education	1

Audience

Location

New Mexico	2
Australia	1
Florida	1
Japan	1
Tennessee	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	1
Test of English as a Foreign…	1
Test of English for…	1

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Development and Validation of a Survey Instrument for Measuring Pre-Service Teachers' Pedagogical Content Knowledge

Peer reviewed

Direct link

Martin, David; Jamieson-Proctor, Romina – International Journal of Research & Method in Education, 2020

In Australia, one of the key findings of the Teacher Education Ministerial Advisory Group was that not all graduating pre-service teachers possess adequate pedagogical content knowledge (PCK) to teach effectively. The concern is that higher education providers working with pre-service teachers are using pedagogical practices and assessments which…

Descriptors: Test Construction, Preservice Teachers, Pedagogical Content Knowledge, Foreign Countries

PE Metrics: Background, Testing Theory, and Methods

Peer reviewed

Direct link

Zhu, Weimo; Rink, Judy; Placek, Judith H.; Graber, Kim C.; Fox, Connie; Fisette, Jennifer L.; Dyson, Ben; Park, Youngsik; Avery, Marybell; Franck, Marian; Raynes, De – Measurement in Physical Education and Exercise Science, 2011

New testing theories, concepts, and psychometric methods (e.g., item response theory, test equating, and item bank) developed during the past several decades have many advantages over previous theories and methods. In spite of their introduction to the field, they have not been fully accepted by physical educators. Further, the manner in which…

Descriptors: Physical Education, Quality Control, Psychometrics, Item Response Theory

Quality of Questions on Common Tests at Issue

Direct link

Sawchuk, Stephen – Education Digest: Essential Readings Condensed for Quick Review, 2010

Most experts in the testing community have presumed that the $350 million promised by the U.S. Department of Education to support common assessments would promote those that made greater use of open-ended items capable of measuring higher-order critical-thinking skills. But as measurement experts consider the multitude of possibilities for an…

Descriptors: Educational Quality, Test Items, Comparative Analysis, Multiple Choice Tests

Effects of Assigning Raters to Items

Peer reviewed

Direct link

Sykes, Robert C.; Ito, Kyoko; Wang, Zhen – Educational Measurement: Issues and Practice, 2008

Student responses to a large number of constructed response items in three Math and three Reading tests were scored on two occasions using three ways of assigning raters: single reader scoring, a different reader for each response (item-specific), and three readers each scoring a rater item block (RIB) containing approximately one-third of a…

Descriptors: Test Items, Mathematics Tests, Reading Tests, Scoring

Detecting Intrajudge Inconsistency in Standard Setting Using Test Items with a Selected-Response Format. Research Report.

Download full text

van der Linden, Wim J.; Vos, Hans J.; Chang, Lei – 2000

In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of performance. Methods to check standard setting…

Descriptors: Interrater Reliability, Judges, Probability, Standard Setting

Sampling of Common Items: An Unrecognized Source of Error in Test Equating. CSE Report 636

Download full text

Michaelides, Michalis P.; Haertel, Edward H. – Center for Research on Evaluation Standards and Student Testing CRESST, 2004

There is variability in the estimation of an equating transformation because common-item parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a…

Descriptors: Test Items, Testing, Error Patterns, Interrater Reliability

Covariates of the Rating Process in Hierarchical Models for Multiple Ratings of Test Items

Peer reviewed

Direct link

Mariano, Louis T.; Junker, Brian W. – Journal of Educational and Behavioral Statistics, 2007

When constructed response test items are scored by more than one rater, the repeated ratings allow for the consideration of individual rater bias and variability in estimating student proficiency. Several hierarchical models based on item response theory have been introduced to model such effects. In this article, the authors demonstrate how these…

Descriptors: Test Items, Item Response Theory, Rating Scales, Scoring

PISA 2006 Technical Report

Direct link

OECD Publishing (NJ1), 2009

The Organisation for Economic Cooperation and Development's (OECD's) Programme for International Student Assessment (PISA) surveys, which take place every three years, have been designed to collect information about 15-year-old students in participating countries. PISA examines how well students are prepared to meet the challenges of the future,…

Descriptors: Policy Formation, Scaling, Academic Achievement, Interrater Reliability

Development of the Portuguese Speaking Test. Year One Project Report. Development of Semi-Direct Tests of Oral Proficiency in Hausa, Hebrew, Indonesian and Portuguese.

Download full text

Stansfield, Charles W.; Kenyon, Dorry Mann – 1988

The development and validation of a Portuguese oral language test are described. The test consisted of five item types: personal conversation, giving directions, description of picture sequences, topical discourse, and oral task completion based on printed instructions. Three preliminary forms of the test were administered to a group of language…

Descriptors: Interrater Reliability, Interviews, Language Tests, Oral Language

New Mexico Standards-Based Assessment Technical Report: Spring 2007 Administration

Download full text

New Mexico Public Education Department, 2007

The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2007 NMSBA. The 2007 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Summary of student performance; (4) Statistical analyses of item and…

Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring

A Survey of Issues and Item Writing in Language Testing.

Download full text

Strong, Gregory – Thought Currents in English Literature, 1995

This paper traces developments in educational psychology and measurement that led to the Test of English as a Foreign Language (TOEFL) and the test of English for International Communication (TOEIC) and the application of educational measurement terms such as validity and reliability to testing. Use of a table of specifications for planning…

Descriptors: Cloze Procedure, Difficulty Level, English (Second Language), Foreign Countries

New Mexico Standards Based Assessment (NMSBA) Technical Report: 2006 Spring Administration

Download full text

Griph, Gerald W. – New Mexico Public Education Department, 2006

The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…

Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring

Interrater Reliability	12
Test Items	12
Test Construction	8
Test Reliability	7
Scoring	6
Item Response Theory	5
Test Validity	5
Student Evaluation	4
Error of Measurement	3
Foreign Countries	3
Mathematics Achievement	3
Measures (Individuals)	3
Psychometrics	3
Reading Achievement	3
Standard Setting	3
Academic Standards	2
Comparative Analysis	2
Computation	2
Cutting Scores	2
Difficulty Level	2
English	2
Factor Analysis	2
Goodness of Fit	2
High Stakes Tests	2
Language Tests	2
More ▼

Avery, Marybell	1
Chang, Lei	1
Dyson, Ben	1
Fisette, Jennifer L.	1
Fox, Connie	1
Franck, Marian	1
Graber, Kim C.	1
Griph, Gerald W.	1
Haertel, Edward H.	1
Ito, Kyoko	1
Jamieson-Proctor, Romina	1
Junker, Brian W.	1
Kenyon, Dorry Mann	1
Mariano, Louis T.	1
Martin, David	1
Michaelides, Michalis P.	1
Park, Youngsik	1
Placek, Judith H.	1
Raynes, De	1
Rink, Judy	1
Sawchuk, Stephen	1
Stansfield, Charles W.	1
Strong, Gregory	1
Sykes, Robert C.	1
Vos, Hans J.	1
More ▼