Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 7 |
| Since 2017 (last 10 years) | 16 |
| Since 2007 (last 20 years) | 70 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Administrators | 2 |
| Policymakers | 1 |
| Practitioners | 1 |
| Teachers | 1 |
Location
| Canada | 4 |
| Florida | 2 |
| United Kingdom (England) | 2 |
| Arizona | 1 |
| Australia | 1 |
| California | 1 |
| Finland | 1 |
| Hong Kong | 1 |
| Maryland | 1 |
| North Carolina | 1 |
| Portugal | 1 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 2 |
| Race to the Top | 2 |
| Individuals with Disabilities… | 1 |
Assessments and Surveys
| Program for International… | 2 |
| Stanford Achievement Tests | 2 |
| Classroom Assessment Scoring… | 1 |
| Florida Comprehensive… | 1 |
| National Assessment of… | 1 |
| Pediatric Evaluation of… | 1 |
What Works Clearinghouse Rating
Arielle Boguslav; Julie Cohen – Journal of Teacher Education, 2024
Teacher preparation programs are increasingly expected to use data on preservice teacher (PST) skills to drive program improvement and provide targeted supports. Observational ratings are especially vital, but also prone to measurement issues. Scores may be influenced by factors unrelated to PSTs' instructional skills, including rater standards.…
Descriptors: Preservice Teachers, Measures (Individuals), Evaluation Problems, Teaching Skills
Mark White; Matt Ronfeldt – Educational Assessment, 2024
Standardized observation systems seek to reliably measure a specific conceptualization of teaching quality, managing rater error through mechanisms such as certification, calibration, validation, and double-scoring. These mechanisms both support high quality scoring and generate the empirical evidence used to support the scoring inference (i.e.,…
Descriptors: Interrater Reliability, Quality Control, Teacher Effectiveness, Error Patterns
Abdulrahman Alshammari – ProQuest LLC, 2024
A critical component of modern software development practices, particularly continuous integration (CI), is the halt of development activities in response to test failures which requires further investigation and debugging. As software changes, regression testing becomes vital to verify that new code does not affect existing functionality.…
Descriptors: Computer Software, Programming, Coding, Test Reliability
Michelle Herridge – ProQuest LLC, 2021
Evaluation of student written work during summative assessments is an important and critical task for instructors at all educational levels. Nevertheless, few research studies exist that provide insights into how different instructors approach this task. Chemistry faculty (FIs) and graduate student instructors (GSIs) regularly engage in the…
Descriptors: Science Instruction, Chemistry, College Faculty, Teaching Assistants
Scott F. Marion, Editor; James W. Pellegrino, Editor; Amy I. Berman, Editor – National Academy of Education, 2024
High-quality assessments are crucial to many aspects of the educational process. They can help policymakers monitor long-term educational trends, assist state educational agencies (SEAs) and local educational agencies (LEAs) in allocating resources and professional development opportunities, provide insights to teachers about how well students…
Descriptors: Educational Assessment, Educational Policy, Equal Education, Test Validity
Kinarsky, Alana R.; Christie, Christina A. – American Journal of Evaluation, 2022
Since 2007, two taxonomies have been proposed to identify the components of evaluation practice that may be specified in an evaluation policy. Little is known, however, about how these taxonomies align with evaluation policies developed by philanthropic foundations. Through thematic analysis, this article first compares 12 foundation evaluation…
Descriptors: Taxonomy, Evaluation Methods, Philanthropic Foundations, Educational Policy
Mojgan Rashtchi; SeyyedeFateme Ghazi Mir Saeed – Sage Research Methods Cases, 2023
The reason for conducting the present case study was the problems the researchers encountered during data collection for another research project (Primary Study) entitled "The effects of virtual versus traditional flipped classes on EFL learners' grammar knowledge, self-regulation, and autonomy." Two online questionnaires were…
Descriptors: Data Collection, Questionnaires, Barriers, Research Methodology
Mücahit Öztürk – Open Praxis, 2024
This study examined the problems that pre-service teachers face in the online assessment process and their suggestions for solutions to these problems. The participants were 136 pre-service teachers who have been experiencing online assessment for a long time and who took the Foundations of Open and Distance Learning course. This research is a…
Descriptors: Foreign Countries, Preservice Teacher Education, Preservice Teachers, Distance Education
Joseph, Gail; Soderberg, Janet S.; Stull, Sara; Cummings, Kevin; McCutchen, Deborah; Han, Rachel J. – Early Education and Development, 2020
Research Findings: This study explores the inter-rater reliability of WaKIDS, Washington State's kindergarten entry assessment (KEA). Specifically, we analyze (1) the extent to which teachers' assessments are in agreement with a master code, (2) how often inaccurate assessment decisions lead to misidentification of school readiness, and (3)…
Descriptors: Interrater Reliability, School Readiness, Kindergarten, Evaluation Problems
Szafran, Robert F. – Practical Assessment, Research & Evaluation, 2017
Institutional assessment of student learning objectives has become a fact-of-life in American higher education and the Association of American Colleges and Universities' (AAC&U) VALUE Rubrics have become a widely adopted evaluation and scoring tool for student work. As faculty from a variety of disciplines, some less familiar with the…
Descriptors: Interrater Reliability, Case Studies, Scoring Rubrics, Behavioral Objectives
Walker, Paul – Composition Forum, 2017
This article describes and theorizes a failed writing program assessment study to question the influence of "the rhetoric of agreement," or reliability, on writing assessment practice and its prevalence in validating institutional mandated assessments. Offering the phrase "dwelling in disagreement" as a queer perspective, the…
Descriptors: Rhetoric, Writing Tests, Test Reliability, Program Validation
Gansemer-Topf, Ann M.; Downey, Jillian; Genschel, Ulrike – Research & Practice in Assessment, 2017
Effective assessment practice requires clearly defining and operationalizing terminology. We illustrate the importance of this practice by focusing on academic "undermatching"--when students enroll in colleges that are less academically selective than those for which they are academically prepared. Undermatching has been viewed as a…
Descriptors: Differences, Definitions, Vocabulary, Comparative Analysis
Lau, Ken – Innovations in Education and Teaching International, 2018
Self-directed learning, despite its growing popularity in education, has challenged conventional assessment practice which often foregrounds the presentation of identical conditions to ensure reliability. This article discusses the results of a case study of university academic English teachers' perceptions and reported practices of assessing…
Descriptors: Independent Study, Teacher Attitudes, Case Studies, Educational Practices
Hartley, James – Psychology Teaching Review, 2017
In this article, Hartley notes the difficulties of using questionnaires to assess the efficiency of new instructional methods and highlights nine issues that researchers must consider. Hartley continues the discussion about the use of questionnaires and suggests that psychology teachers can help improve the teaching of psychology by drawing…
Descriptors: Questionnaires, Instructional Innovation, Instructional Effectiveness, Teaching Methods
Parker, Richard I.; Vannest, Kimberly J.; Davis, John L. – Journal of School Psychology, 2013
The use of multi-category scales is increasing for the monitoring of IEP goals, classroom and school rules, and Behavior Improvement Plans (BIPs). Although they require greater inference than traditional data counting, little is known about the inter-rater reliability of these scales. This simulation study examined the performance of nine…
Descriptors: Rating Scales, Scaling, Interrater Reliability, Test Reliability

Peer reviewed
Direct link
