Publication Date
| In 2026 | 0 |
| Since 2025 | 5 |
| Since 2022 (last 5 years) | 11 |
| Since 2017 (last 10 years) | 19 |
| Since 2007 (last 20 years) | 51 |
Descriptor
| Foreign Countries | 54 |
| Interrater Reliability | 54 |
| Reliability | 54 |
| Validity | 24 |
| Statistical Analysis | 12 |
| Correlation | 11 |
| Scores | 11 |
| Comparative Analysis | 8 |
| Rating Scales | 8 |
| Scoring Rubrics | 8 |
| Elementary School Students | 7 |
| More ▼ | |
Source
Author
Publication Type
| Journal Articles | 51 |
| Reports - Research | 46 |
| Reports - Evaluative | 4 |
| Tests/Questionnaires | 3 |
| Dissertations/Theses -… | 2 |
| Information Analyses | 2 |
| Numerical/Quantitative Data | 1 |
| Reports - Descriptive | 1 |
Education Level
| Higher Education | 17 |
| Postsecondary Education | 16 |
| Elementary Education | 10 |
| Secondary Education | 7 |
| Early Childhood Education | 5 |
| High Schools | 3 |
| Preschool Education | 3 |
| Grade 1 | 2 |
| Grade 2 | 2 |
| Grade 4 | 2 |
| Primary Education | 2 |
| More ▼ | |
Audience
| Researchers | 1 |
Location
| Canada | 6 |
| Turkey | 6 |
| Australia | 5 |
| Netherlands | 4 |
| Taiwan | 4 |
| United States | 4 |
| China | 3 |
| Italy | 3 |
| Norway | 3 |
| Spain | 3 |
| United Kingdom (England) | 3 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Marcus Messer; Neil C. C. Brown; Michael Kölling; Miaojing Shi – ACM Transactions on Computing Education, 2025
Providing consistent summative assessment to students is important, as the grades they are awarded affect their progression through university and future career prospects. While small cohorts are typically assessed by a single assessor, such as the module/class leader, larger cohorts are often assessed by multiple assessors, typically teaching…
Descriptors: Foreign Countries, Grading, Interrater Reliability, Teaching Assistants
Hulteen, Ryan M.; True, Larissa; Kroc, Edward – Measurement in Physical Education and Exercise Science, 2023
The typical process for assessing inter-rater reliability is facilitated by training raters within a research team. Lacking is an understanding if inter-rater reliability scores "between" research teams demonstrate adequate reliability. This study examined inter-rater reliability between 16 researchers who assessed fundamental motor…
Descriptors: Psychomotor Skills, Scores, Reliability, Interrater Reliability
Sarah Stopforth; Roxanne Connelly; Vernon Gayle – Cambridge Journal of Education, 2025
Data on educational qualifications is essential in many research domains. The UK Millennium Cohort Study collected self-reported General Certificate of Secondary Education (GCSE) data in sweep 7 (cohort members aged 17). GCSE data from the National Pupil Database (NPD) has been linked to the MCS. This study investigates the consistency of these…
Descriptors: Foreign Countries, Adolescents, Case Studies, Secondary Education
Wahyu Nanda Eka Saputra; Trikinasih Handayani; Prima Suci Rohmadheny; Rohmatus Naini; Dody Hartanto; Hardi Santosa; Dewi Afra Khairunnisa; Risma Risansyah; Hanan Riati; Faturrahman – Journal of Education and Learning (EduLearn), 2025
The students are urged to do something without expecting anything in return and only in the name of God. Every islamic student becomes something ideal if they can internalize and implement sincerity. Many people are willing to do something because of an ulterior motive. The importance of sincerity in humans is the background for developing a…
Descriptors: Islam, Interrater Reliability, Prosocial Behavior, Muslims
Shasha Chen; Shaohui Chi; Zuhao Wang – Journal of Baltic Science Education, 2025
Interdisciplinary thinking is critical for equipping students to apply scientific knowledge and tackle societal challenges across various disciplines, which has been recognized as a key objective of twenty-first century science education. However, research on effective interdisciplinary assessment in secondary school science education is still…
Descriptors: Thinking Skills, Interdisciplinary Approach, Science Instruction, Grade 7
Øydis Hide; Dagrun Slettebø Daltveit; Åse Sivertsen; Anne Katherine Hvistendahl; Randi Lovise Kjerstad; Marit Berntsen Kvinnsland; Nina Helen Pedersen; Christina Sørensen – International Journal of Language & Communication Disorders, 2025
Background: Cleft lip and palate (CLP) treatment in Norway is centralized and multidisciplinary, with long-term follow-up from birth to adulthood. The Norwegian Registry of Cleft Lip and Palate was established to ensure high-quality care and enable systematic data collection. Speech data are a key component, assessed by speech--language therapists…
Descriptors: Foreign Countries, Validity, Reliability, Data Collection
Sas, Marlies; Snaphaan, Thom; Pauwels, Lieven J. R.; Ponnet, Koen; Hardyns, Wim – Field Methods, 2023
This study focuses on the use of systematic social observations (SSO) to measure crime prevention through environmental design (CPTED) and disorder. To improve knowledge about measurement issues in small area research, SSO is conducted by means of three different methods: in-situ, photographs, and Google Street View (GSV) imagery. By evaluating…
Descriptors: Crime Prevention, Measurement Techniques, Photography, Observation
Brogan L. Barr; Virginia V. W. McIntosh; Eileen F. Britt; Jennifer Jordan; Janet D. Carter – Measurement: Interdisciplinary Research and Perspectives, 2024
Even when raters demonstrate agreement in the use of a measure, limited score variability or violation of often-ignored statistical assumptions can result in lower reliability estimates than intuitively expected. This article uses data drawn from two randomized controlled trials of schema therapy and cognitive behavioral therapy for the treatment…
Descriptors: Evaluators, Interrater Reliability, Reliability, Measurement Techniques
Dankiw, Kylie A.; Baldock, Katherine L.; Kumar, Saravana; Tsiros, Margarita D. – Australasian Journal of Early Childhood, 2021
Identifying and describing children's play behaviours is an important component of evaluating child development. The Behaviour Mapping Schedule is a direct observational tool which aims to describe and quantify children's play behaviours but is yet to undergo reliability testing. This study aimed to determine the intra- and inter-rater reliability…
Descriptors: Interrater Reliability, Classification, Child Behavior, Play
Nuñez-Polo, Mercedes H. – Journal of Mental Health Research in Intellectual Disabilities, 2022
Introduction: The aim of this study is to validate a Spanish version of the Impact of Event Scale on People with ID (IES-ID). Methods: IES-ID was administered to adults with ID (n = 120), analyzing internal consistency, inter-rater and test-retest reliability, criterion validity, construct validity and feasibility. Results: Good internal…
Descriptors: Spanish, Translation, Construct Validity, Factor Analysis
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Alkhanani, Badriah – International Journal of Language Education, 2022
The purpose of this study was to find the effect of English Language Teachers' Methodology (ELTM) on the Career Growth (CG) of the Saudi students. In order to provide a solid basis for this research study, a cross-sectional-descriptive research design was employed. For scale development and tool standardization, inter-class correlation…
Descriptors: Career Development, English (Second Language), Second Language Learning, Second Language Instruction
Hardin, Belinda J.; Bergen, Doris; Busio, Dionne Sills; Boone, William – Early Childhood Education Journal, 2017
The Third Edition of the ACEI Global Guidelines Assessment (GGA) was evaluated for its effectiveness as an international assessment tool for use by early childhood educators to develop, assess, and improve program quality worldwide. This expanded study was conducted in nine countries [People's Republic of China (2 sites), Guatemala, India, Italy,…
Descriptors: Foreign Countries, International Assessment, Early Childhood Education, Psychometrics
Ramon-Casas, Marta; Nuño, Neus; Pons, Ferran; Cunillera, Toni – Assessment & Evaluation in Higher Education, 2019
This article presents an empirical evaluation of the validity and reliability of a peer-assessment activity to improve academic writing competences. Specifically, we explored a large group of psychology undergraduate students with different initial writing skills. Participants (n = 365) produced two different essays, which were evaluated by their…
Descriptors: Peer Evaluation, Validity, Reliability, Writing Skills
Taylor, Lauren J.; Eapen, Valsamma; Maybery, Murray; Midford, Sue; Paynter, Jessica; Quarmby, Lyndsay; Smith, Timothy; Williams, Katrina; Whitehouse, Andrew J. – Journal of Autism and Developmental Disorders, 2017
Previous research shows inconsistency in clinician-assigned diagnoses of Autism Spectrum Disorder (ASD). We conducted an exploratory study that examined the concordance of diagnoses between a multidisciplinary assessment team and a range of independent clinicians throughout Australia. Nine video-taped Autism Diagnostic Observation Schedule (ADOS)…
Descriptors: Autism, Pervasive Developmental Disorders, Clinical Diagnosis, Foreign Countries

Peer reviewed
Direct link
