Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 7 |
Descriptor
Source
Author
| Avery, Marybell | 1 |
| Chang, Lei | 1 |
| Dyson, Ben | 1 |
| Fisette, Jennifer L. | 1 |
| Fox, Connie | 1 |
| Franck, Marian | 1 |
| Graber, Kim C. | 1 |
| Griph, Gerald W. | 1 |
| Haertel, Edward H. | 1 |
| Ito, Kyoko | 1 |
| Jamieson-Proctor, Romina | 1 |
| More ▼ | |
Publication Type
| Reports - Descriptive | 12 |
| Journal Articles | 6 |
| Numerical/Quantitative Data | 3 |
| Tests/Questionnaires | 1 |
Education Level
| Elementary Secondary Education | 3 |
| Elementary Education | 1 |
| Higher Education | 1 |
| Postsecondary Education | 1 |
Audience
Location
| New Mexico | 2 |
| Australia | 1 |
| Florida | 1 |
| Japan | 1 |
| Tennessee | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Program for International… | 1 |
| Test of English as a Foreign… | 1 |
| Test of English for… | 1 |
What Works Clearinghouse Rating
Martin, David; Jamieson-Proctor, Romina – International Journal of Research & Method in Education, 2020
In Australia, one of the key findings of the Teacher Education Ministerial Advisory Group was that not all graduating pre-service teachers possess adequate pedagogical content knowledge (PCK) to teach effectively. The concern is that higher education providers working with pre-service teachers are using pedagogical practices and assessments which…
Descriptors: Test Construction, Preservice Teachers, Pedagogical Content Knowledge, Foreign Countries
Zhu, Weimo; Rink, Judy; Placek, Judith H.; Graber, Kim C.; Fox, Connie; Fisette, Jennifer L.; Dyson, Ben; Park, Youngsik; Avery, Marybell; Franck, Marian; Raynes, De – Measurement in Physical Education and Exercise Science, 2011
New testing theories, concepts, and psychometric methods (e.g., item response theory, test equating, and item bank) developed during the past several decades have many advantages over previous theories and methods. In spite of their introduction to the field, they have not been fully accepted by physical educators. Further, the manner in which…
Descriptors: Physical Education, Quality Control, Psychometrics, Item Response Theory
Sawchuk, Stephen – Education Digest: Essential Readings Condensed for Quick Review, 2010
Most experts in the testing community have presumed that the $350 million promised by the U.S. Department of Education to support common assessments would promote those that made greater use of open-ended items capable of measuring higher-order critical-thinking skills. But as measurement experts consider the multitude of possibilities for an…
Descriptors: Educational Quality, Test Items, Comparative Analysis, Multiple Choice Tests
Sykes, Robert C.; Ito, Kyoko; Wang, Zhen – Educational Measurement: Issues and Practice, 2008
Student responses to a large number of constructed response items in three Math and three Reading tests were scored on two occasions using three ways of assigning raters: single reader scoring, a different reader for each response (item-specific), and three readers each scoring a rater item block (RIB) containing approximately one-third of a…
Descriptors: Test Items, Mathematics Tests, Reading Tests, Scoring
van der Linden, Wim J.; Vos, Hans J.; Chang, Lei – 2000
In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of performance. Methods to check standard setting…
Descriptors: Interrater Reliability, Judges, Probability, Standard Setting
Michaelides, Michalis P.; Haertel, Edward H. – Center for Research on Evaluation Standards and Student Testing CRESST, 2004
There is variability in the estimation of an equating transformation because common-item parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a…
Descriptors: Test Items, Testing, Error Patterns, Interrater Reliability
Mariano, Louis T.; Junker, Brian W. – Journal of Educational and Behavioral Statistics, 2007
When constructed response test items are scored by more than one rater, the repeated ratings allow for the consideration of individual rater bias and variability in estimating student proficiency. Several hierarchical models based on item response theory have been introduced to model such effects. In this article, the authors demonstrate how these…
Descriptors: Test Items, Item Response Theory, Rating Scales, Scoring
OECD Publishing (NJ1), 2009
The Organisation for Economic Cooperation and Development's (OECD's) Programme for International Student Assessment (PISA) surveys, which take place every three years, have been designed to collect information about 15-year-old students in participating countries. PISA examines how well students are prepared to meet the challenges of the future,…
Descriptors: Policy Formation, Scaling, Academic Achievement, Interrater Reliability
Stansfield, Charles W.; Kenyon, Dorry Mann – 1988
The development and validation of a Portuguese oral language test are described. The test consisted of five item types: personal conversation, giving directions, description of picture sequences, topical discourse, and oral task completion based on printed instructions. Three preliminary forms of the test were administered to a group of language…
Descriptors: Interrater Reliability, Interviews, Language Tests, Oral Language
New Mexico Public Education Department, 2007
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2007 NMSBA. The 2007 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Summary of student performance; (4) Statistical analyses of item and…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring
Strong, Gregory – Thought Currents in English Literature, 1995
This paper traces developments in educational psychology and measurement that led to the Test of English as a Foreign Language (TOEFL) and the test of English for International Communication (TOEIC) and the application of educational measurement terms such as validity and reliability to testing. Use of a table of specifications for planning…
Descriptors: Cloze Procedure, Difficulty Level, English (Second Language), Foreign Countries
Griph, Gerald W. – New Mexico Public Education Department, 2006
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring

Peer reviewed
Direct link
