Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 8 |
| Since 2017 (last 10 years) | 8 |
| Since 2007 (last 20 years) | 210 |
Descriptor
| Educational Testing | 610 |
| Evaluation Methods | 610 |
| Student Evaluation | 272 |
| Educational Assessment | 231 |
| Elementary Secondary Education | 155 |
| Academic Achievement | 133 |
| Program Evaluation | 131 |
| Achievement Tests | 113 |
| Accountability | 108 |
| Educational Policy | 104 |
| Disabilities | 98 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Elementary Secondary Education | 150 |
| Elementary Education | 75 |
| Secondary Education | 67 |
| High Schools | 61 |
| Grade 4 | 57 |
| Grade 8 | 55 |
| Higher Education | 37 |
| Grade 10 | 29 |
| Postsecondary Education | 28 |
| Grade 11 | 21 |
| Adult Education | 8 |
| More ▼ | |
Audience
| Practitioners | 40 |
| Teachers | 21 |
| Administrators | 8 |
| Researchers | 8 |
| Policymakers | 7 |
| Students | 3 |
| Counselors | 1 |
| Media Staff | 1 |
Location
| United Kingdom | 18 |
| Canada | 13 |
| Florida | 10 |
| California | 9 |
| United Kingdom (England) | 9 |
| Kentucky | 8 |
| Australia | 7 |
| United States | 7 |
| United Kingdom (Wales) | 6 |
| New York | 5 |
| Virginia | 5 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards with or without Reservations | 1 |
Walker, Karen – Education Partnerships, Inc., 2009
There are many kinds of classroom assessment--from informal observation of students to more formal exams and standardized tests. Two categories have been identified--formative assessments and summative assessments. Formative assessments are used by teachers to provide feedback to students and to guide improvement of instruction. For example, if a…
Descriptors: Student Evaluation, Educational Strategies, Teaching Methods, Standardized Tests
Myford, Carol M.; Wolfe, Edward W. – Journal of Educational Measurement, 2009
In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…
Descriptors: English Literature, Advanced Placement, Measures (Individuals), Writing (Composition)
Penfield, Randall D. – Applied Measurement in Education, 2007
A widely used approach for categorizing the level of differential item functioning (DIF) in dichotomous items is the scheme proposed by Educational Testing Service (ETS) based on a transformation of the Mantel-Haeszel common odds ratio. In this article two classification schemes for DIF in polytomous items (referred to as the P1 and P2 schemes)…
Descriptors: Simulation, Educational Testing, Test Bias, Evaluation Methods
Rothman, Robert – Alliance for Excellent Education, 2010
Assessment has long been at the center of education policy debates, and for good reason. The goal of schooling is to maximize student learning, and assessments provide a picture of what students know and are able to do. Assessments also have a strong influence on what goes on in classrooms. The United States is now poised to make the most dramatic…
Descriptors: Foreign Countries, Comparative Education, Student Evaluation, Elementary Secondary Education
Perez, Jose A., Jr.; Greer, Sharon – Advances in Health Sciences Education, 2009
The Internal Medicine In-Training Examination (ITE) is administered during residency training in the United States as a self-assessment and program assessment tool. Performance on this exam correlates with outcome on the American Board of Internal Medicine Certifying examination. Internal Medicine Program Directors use the United States Medical…
Descriptors: Internal Medicine, Program Effectiveness, Statistical Significance, Correlation
Kunina-Habenicht, Olga; Rupp, Andre A.; Wilhelm, Oliver – Studies in Educational Evaluation, 2009
In recent years there has been an increasing international interest in fine-grained diagnostic inferences on multiple skills for formative purposes. A successful provision of such inferences that support meaningful instructional decision-making requires (a) careful diagnostic assessment design coupled with (b) empirical support for the structure…
Descriptors: Educational Testing, Diagnostic Tests, Multidimensional Scaling, Factor Analysis
Baumert, Jurgen; Ludtke, Oliver; Trautwein, Ulrich; Brunner, Martin – Educational Research Review, 2009
Given the relatively high intercorrelations observed between mathematics achievement, reading achievement, and cognitive ability, it has recently been claimed that student assessment studies (e.g., TIMSS, PISA) and intelligence tests measure a single cognitive ability that is practically identical to general intelligence. The present article uses…
Descriptors: Intelligence, Reading Achievement, Mathematics Achievement, Outcomes of Education
Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009
Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…
Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel
Heritage, Margaret; Kim, Jinok; Vendlinski, Terry; Herman, Joan – Educational Measurement: Issues and Practice, 2009
Based on the results of a generalizability study of measures of teacher knowledge for teaching mathematics developed at the National Center for Research on Evaluation, Standards, and Student Testing at the University of California, Los Angeles, this article provides evidence that teachers are better at drawing reasonable inferences about student…
Descriptors: Formative Evaluation, Educational Testing, Inferences, Mathematics Instruction
Tennessee Department of Education, 2012
In the summer of 2011, the Tennessee Department of Education contracted with the National Institute for Excellence in Teaching (NIET) to provide a four-day training for all evaluators across the state. NIET trained more than 5,000 evaluators intensively in the state model (districts using alternative instruments delivered their own training).…
Descriptors: Video Technology, Feedback (Response), Evaluators, Interrater Reliability
Eckes, Suzanne; Swando, Julie – Teachers College Record, 2009
Background/Context: There are few empirical studies exploring the alleged conflict between the No Child Left Behind Act (NCLB) and the Individuals with Disabilities Education Act (IDEA). Objective: The purpose of this study was to examine what impact the No Child Left Behind Act has had on students with disabilities. Research Design: Specifically,…
Descriptors: General Education, Federal Legislation, Educational Improvement, Federal Programs
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology
Nichols, Paul D.; Meyers, Jason L.; Burling, Kelly S. – Educational Measurement: Issues and Practice, 2009
Assessments labeled as formative have been offered as a means to improve student achievement. But labels can be a powerful way to miscommunicate. For an assessment use to be appropriately labeled "formative," both empirical evidence and reasoned arguments must be offered to support the claim that improvements in student achievement can be linked…
Descriptors: Academic Achievement, Tutoring, Student Evaluation, Evaluation Methods
Tatsuoka, Curtis – Measurement: Interdisciplinary Research and Perspectives, 2009
In this commentary, the author addresses what is referred to as the deterministic input, noisy "and" gate (DINA) model. The author mentions concerns with how this model has been formulated and presented. In particular, the author points out that there is a lack of recognition of the confounding of profiles that generally arises and then discusses…
Descriptors: Test Items, Classification, Psychometrics, Item Response Theory
Maris, Gunter; Bechger, Timo – Measurement: Interdisciplinary Research and Perspectives, 2009
Rupp and Templin (2008) do a good job at describing the ever expanding landscape of Diagnostic Classification Models (DCM). In many ways, their review article clearly points to some of the questions that need to be answered before DCMs can become part of the psychometric practitioners toolkit. Apart from the issues mentioned in this article that…
Descriptors: Factor Analysis, Classification, Psychometrics, Item Response Theory

Peer reviewed
Direct link
