Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Yesildag Hasancebi, Funda; Yuksel, Busra Tuncay; Mesci, Gunkut – International Journal of Assessment Tools in Education, 2022
The purpose of this study was to develop a reliable and valid rating scale for the use of the assessment and evaluation of lesson plans and teaching practices that are based on argumentation-based inquiry (ABI). The study covered two academic years (four academic semesters). Qualitative and quantitative methods were utilized throughout the…
Descriptors: Foreign Countries, Rating Scales, Test Construction, Test Validity
Marzieh Pashmdarfard; Afsoon Hassani Mehraban; Narges Shafaroodi; Kamran Soltani Arabshahi; Soroor Parvizy; Akram Azad; Samaneh Karamali Esmaeili – Journal of Occupational Therapy Education, 2022
Fieldwork education is an integral part of the educational process in occupational therapy and assessing student competency at the end of fieldwork is important. The aim of this study was to design and conduct an Objective Structured Clinical Examination (OSCE) based on the Occupational Therapy Practice Framework (OTPF) for occupational therapy…
Descriptors: Occupational Therapy, Allied Health Occupations Education, Test Construction, Test Validity
Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023
Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…
Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication
Grisham, Jennifer; Waddell, Misti; Crawford, Rebecca; Toland, Michael – Journal of Early Intervention, 2021
The purpose of this article is to provide evidence of the technical adequacy of the Assessment, Evaluation, and Programming System--Third Edition (AEPS-3). The AEPS has long been identified as one of the most psychometrically sound early childhood curriculum-based assessments. In this article, results of three studies of technical adequacy are…
Descriptors: Infants, Young Children, Curriculum Based Assessment, Psychometrics
McDonald, Margarethe; Kwon, Taeahn; Kim, Hyunji; Lee, Youngki; Ko, Eon-Suk – Journal of Speech, Language, and Hearing Research, 2021
Purpose: The algorithm of the Language ENvironment Analysis (LENA) system for calculating language environment measures was trained on American English; thus, its validity with other languages cannot be assumed. This article evaluates the accuracy of the LENA system applied to Korean. Method: We sampled sixty 5-min recording clips involving 38 key…
Descriptors: Computational Linguistics, Korean, Audio Equipment, Accuracy
Goldhaber, Dan; Grout, Cyrus; Wolff, Malcolm; Martinková, Patrícia – Grantee Submission, 2021
There is growing interest in using measures of teacher applicant quality to improve hiring decisions, but the statistical properties of such measures are not well understood. We use unique data on structured ratings solicited from the references of teacher applicants to explore the dimensionality of measures of teacher applicant quality and the…
Descriptors: Teacher Selection, Job Applicants, Teacher Qualifications, Letters (Correspondence)
Baraldi Cunha, Andrea; Babik, Iryna; Koziol, Natalie A.; Hsu, Lin-Ya; Nord, Jayden; Harbourne, Regina T.; Westcott-McCoy, Sarah; Dusing, Stacey C.; Bovaird, James A.; Lobo, Michele A. – Grantee Submission, 2021
Purpose: To evaluate the validity, reliability, and sensitivity of the novel Means-End Problem-Solving Assessment Tool (MEPSAT). Methods: Children with typical development and those with motor delay were assessed throughout the first 2 years of life using the MEPSAT. MEPSAT scores were validated against the cognitive and motor subscales of the…
Descriptors: Problem Solving, Early Intervention, Evaluation Methods, Motor Development
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Park, Yeonggwang; Cádiz, Manuel Díaz; Nagle, Kathleen F.; Stepp, Cara E. – Journal of Speech, Language, and Hearing Research, 2020
Purpose: Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Method: Stimuli were created using recordings of…
Descriptors: Acoustics, Audio Equipment, Auditory Perception, Correlation
Beck, Klaus – Frontline Learning Research, 2020
Many test developers try to ensure the content validity of their tests by having external experts review the items, e.g. in terms of relevance, difficulty, or clarity. Although this approach is widely accepted, a closer look reveals several pitfalls need to be avoided if experts' advice is to be truly helpful. The purpose of this paper is to…
Descriptors: Content Validity, Psychological Testing, Educational Testing, Student Evaluation
Kinnear, George; Bennett, Max; Binnie, Rachel; Bolt, Róisín; Zheng, Yinglan – Teaching Mathematics and Its Applications, 2020
The MATH taxonomy classifies questions according to the mathematical skills required to answer them. It was created to aid the development of more balanced assessments in undergraduate mathematics and has since been used to compare different assessment regimes across school and university. To date, there has been no systematic investigation of the…
Descriptors: Taxonomy, Mathematics Instruction, Teaching Methods, Reliability
Johnson, Evelyn S.; Zheng, Yuzhu; Moylan, Laura A.; Crawford, Angela – Grantee Submission, 2020
In this study, we investigated factors that influence raters' application of the scoring criteria of an Explicit Instruction (EI) observation protocol using many-faceted Rasch measurement (MFRM) and think aloud analysis. Specifically, we investigated the extent to which raters are able to consistently represent the scoring criteria in the EI…
Descriptors: Interrater Reliability, Teacher Evaluation, Special Education Teachers, Item Response Theory
Olivarez, Joseph D.; Bales, Stephen; Sare, Laura; vanDuinkerken, Wyoma – College & Research Libraries, 2018
Jeffrey Beall's blog listing of potential predatory journals and publishers, as well as his "Criteria for Determining Predatory Open-Access (OA) Publishers" are often looked at as tools to help researchers avoid publishing in predatory journals. While these "Criteria" have brought a greater awareness of OA predatory journals,…
Descriptors: Information Science, Library Science, Periodicals, Evaluation Criteria
Looney, Marilyn A. – Measurement in Physical Education and Exercise Science, 2018
The purpose of this article was two-fold (1) provide an overview of the commonly reported and under-reported absolute agreement indices in the kinesiology literature for continuous data; and (2) present examples of these indices for hypothetical data along with recommendations for future use. It is recommended that three types of information be…
Descriptors: Interrater Reliability, Evaluation Methods, Kinetics, Indexes
Wronowski, Meredith; VanGronigen, Bryan A.; Henry, Wesley; Olive, James L. – School Community Journal, 2022
School improvement plans (SIPs) have become a central feature of schooling. Educational leaders experience tension between balancing compliance with accountability demands and continuous improvement, and neither of these lenses is centered in the social justice necessary for closing opportunity gaps. We propose a new rubric for assessing the…
Descriptors: Educational Improvement, Educational Planning, Accountability, Evaluation Methods

Peer reviewed
Direct link
