ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	0
Since 2007 (last 20 years)	14

Descriptor

Educational Testing	15
Evaluation Problems	15
Measurement	15
Educational Assessment	12
Evaluation Methods	11
Measurement Techniques	9
Psychometrics	9
Student Evaluation	7
Correlation	6
Models	6
Testing Problems	6
Classification	5
Diagnostic Tests	5
Educational Policy	5
Evaluation Research	5
Evidence	5
Item Response Theory	5
Teacher Evaluation	5
Criterion Referenced Tests	4
State of the Art Reviews	4
Academic Achievement	3
Credentials	3
Error of Measurement	3
Longitudinal Studies	3
Scoring	3
More ▼

Source

Measurement:…	6
Journal of Educational…	3
National Center for Analysis…	2
American Educational Research…	1
Review of Research in…	1
Scholar-Practitioner Quarterly	1
TESL Canada Journal	1

Publication Type

Journal Articles	13
Opinion Papers	7
Reports - Evaluative	6
Reports - Research	2
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	9
Secondary Education	2
Elementary Education	1
Grade 3	1
Grade 4	1
Grade 5	1

Audience

Location

New York	2
California	1
Canada	1
Illinois	1
New Jersey	1
North Carolina	1
Tennessee	1
Texas	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Advanced Placement…	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Different Tests, Different Answers: The Stability of Teacher Value-Added Estimates across Outcome Measures

Peer reviewed

Direct link

Papay, John P. – American Educational Research Journal, 2011

Recently, educational researchers and practitioners have turned to value-added models to evaluate teacher performance. Although value-added estimates depend on the assessment used to measure student achievement, the importance of outcome selection has received scant attention in the literature. Using data from a large, urban school district, I…

Descriptors: Urban Schools, Teacher Effectiveness, Reading Achievement, Achievement Tests

Monitoring Rater Performance over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use

Peer reviewed

Direct link

Myford, Carol M.; Wolfe, Edward W. – Journal of Educational Measurement, 2009

In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…

Descriptors: English Literature, Advanced Placement, Measures (Individuals), Writing (Composition)

Judges' Use of Examinee Performance Data in an Angoff Standard-Setting Exercise for a Medical Licensing Examination: An Experimental Study

Peer reviewed

Direct link

Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009

Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…

Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

Diagnostic Models as Partially Ordered Sets

Peer reviewed

Direct link

Tatsuoka, Curtis – Measurement: Interdisciplinary Research and Perspectives, 2009

In this commentary, the author addresses what is referred to as the deterministic input, noisy "and" gate (DINA) model. The author mentions concerns with how this model has been formulated and presented. In particular, the author points out that there is a lack of recognition of the confounding of profiles that generally arises and then discusses…

Descriptors: Test Items, Classification, Psychometrics, Item Response Theory

Equivalent Diagnostic Classification Models

Peer reviewed

Direct link

Maris, Gunter; Bechger, Timo – Measurement: Interdisciplinary Research and Perspectives, 2009

Rupp and Templin (2008) do a good job at describing the ever expanding landscape of Diagnostic Classification Models (DCM). In many ways, their review article clearly points to some of the questions that need to be answered before DCMs can become part of the psychometric practitioners toolkit. Apart from the issues mentioned in this article that…

Descriptors: Factor Analysis, Classification, Psychometrics, Item Response Theory

How Much Can We Reliably Know about What Examinees Know?

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J. – Measurement: Interdisciplinary Research and Perspectives, 2009

In this commentary, the authors discuss some of the issues regarding the use of diagnostic classification models that practitioners should keep in mind. In the authors experience, these issues are not as well known as they should be. The authors then provide recommendations on diagnostic scoring.

Descriptors: Scoring, Reliability, Validity, Classification

Using Value-Added Measures of Teacher Quality. Brief 9

Download full text

Hanushek, Eric A.; Rivkin, Steven G. – National Center for Analysis of Longitudinal Data in Education Research, 2010

Extensive education research on the contribution of teachers to student achievement produces two generally accepted results. First, teacher quality varies substantially as measured by the value added to student achievement or future academic attainment or earnings. Second, variables often used to determine entry into the profession and…

Descriptors: Credentials, Teacher Effectiveness, Models, Teacher Qualifications

Diagnostic Classification Models and Multidimensional Adaptive Testing: A Commentary on Rupp and Templin

Peer reviewed

Direct link

Frey, Andreas; Carstensen, Claus H. – Measurement: Interdisciplinary Research and Perspectives, 2009

On a general level, the objective of diagnostic classifications models (DCMs) lies in a classification of individuals regarding multiple latent skills. In this article, the authors show that this objective can be achieved by multidimensional adaptive testing (MAT) as well. The authors discuss whether or not the restricted applicability of DCMs can…

Descriptors: Adaptive Testing, Test Items, Classification, Psychometrics

Do Test Formats in Reading Comprehension Affect Second-Language Students' Test Performance Differently?

Peer reviewed
PDF on ERIC

Download full text

Direct link

Zheng, Ying; Cheng, Liying; Klinger, Don A. – TESL Canada Journal, 2007

Large scale testing in English affects second-language students not only greatly but also differently than first-language learners. The research literature reports that confounding factors in such large-scale testing such as varying test formats may differentially affect the performance of students from diverse backgrounds. An investigation of…

Descriptors: Reading Comprehension, Reading Tests, Test Format, Educational Testing

Measuring Effect Sizes: The Effect of Measurement Error. Working Paper 19

Download full text

Boyd, Donald; Grossman, Pamela; Lankford, Hamilton; Loeb, Susanna; Wyckoff, James – National Center for Analysis of Longitudinal Data in Education Research, 2008

Value-added models in education research allow researchers to explore how a wide variety of policies and measured school inputs affect the academic performance of students. Researchers typically quantify the impacts of such interventions in terms of "effect sizes", i.e., the estimated effect of a one standard deviation change in the…

Descriptors: Credentials, Teacher Effectiveness, Models, Teacher Qualifications

Validating the MKT Measures: Some Responses to the Commentaries

Peer reviewed

Direct link

Hill, Heather C. – Measurement: Interdisciplinary Research and Perspectives, 2007

The author offers some thoughts on commentator's reactions to the substance of the measures, particularly those about measuring teacher learning and change, based on the major uses of the measures, and because this is a significant challenge facing test development as an enterprise. If teacher learning results in more integrated knowledge or…

Descriptors: Educational Testing, Tests, Measurement, Faculty Development

Naming and Classifying: Theory, Evidence, and Equity in Education

Peer reviewed

Direct link

Lucas, Samuel R.; Beresford, Lauren – Review of Research in Education, 2010

Education names and classifies individuals. This result seems unavoidable. For example, some students will graduate, and some will not. Those who graduate will be "graduates"; those who do not graduate will be labeled otherwise. The only way to avoid such labeling is to fail to make distinctions of any kind. Yet education is rife with…

Descriptors: Social Science Research, Equal Education, Outcomes of Education, Inferences

Generalizability and Specificity of Interpretive Arguments: Observations Inspired by the Commentaries

Peer reviewed

Direct link

Schilling, Stephen – Measurement: Interdisciplinary Research and Perspectives, 2007

In this article, the author echoes his co-author's and colleague's pleasure (Hill, this issue) at the thoughtfulness and far-ranging nature of the comments to their initial attempts at test validation for the mathematical knowledge for teaching (MKT) measures using the validity argument approach. Because of the large number of commentaries they…

Descriptors: Generalizability Theory, Persuasive Discourse, Educational Testing, Measurement

Measure, Mismeasure, or Not Measurement at All?: Psychometrics as Political Theory

Peer reviewed
PDF on ERIC

Download full text

Direct link

Garrison, Mark J. – Scholar-Practitioner Quarterly, 2004

The author of this article challenges a common assumption made by both critics and defenders of standardized-testing technology (or psychometry), namely that standardized tests "measure" something (culture, ability, etc.). It argues that psychometric practice cannot be classified as a form of measurement and instead is best understood as…

Descriptors: Educational Assessment, Social Values, Psychometrics, Standardized Tests

Baldwin, Su G.	1
Bechger, Timo	1
Beresford, Lauren	1
Boyd, Donald	1
Carstensen, Claus H.	1
Cheng, Liying	1
Clauser, Brian E.	1
Cui, Ying	1
Dillon, Gerard F.	1
Frey, Andreas	1
Garrison, Mark J.	1
Grossman, Pamela	1
Haberman, Shelby J.	1
Hanushek, Eric A.	1
Hill, Heather C.	1
Klinger, Don A.	1
Lankford, Hamilton	1
Leighton, Jacqueline P.	1
Loeb, Susanna	1
Lucas, Samuel R.	1
Margolis, Melissa J.	1
Maris, Gunter	1
Mee, Janet	1
Myford, Carol M.	1
Papay, John P.	1
More ▼