Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Hidalgo, María Ángeles; Lázaro-Ibarrola, Amparo – Studies in Second Language Learning and Teaching, 2020
Research into the potential of collaborative writing is relatively new. Similarly, task repetition (TR), which has been claimed to be a valuable tool for language learning, has been rarely explored in the context of writing. Therefore, little is known about the potential of combining TR and collaborative writing, and even less if we focus on young…
Descriptors: Task Analysis, Second Language Learning, Second Language Instruction, Accuracy
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Brendan Bartanen; Andrew Kwok – Annenberg Institute for School Reform at Brown University, 2020
Using rich longitudinal data from one of the largest teacher education programs in Texas, we examine the measurement of pre-service teacher (PST) quality and its relationship with entry into the K-12 public school teacher workforce. Drawing on rubric-based observations of PSTs during clinical teaching, we find that little of the variation in…
Descriptors: Longitudinal Studies, Preservice Teachers, Teacher Education Programs, Kindergarten
Maxwell, Bruce; Boon, Helen; Tanchuk, Nicolas; Rauwerda, Bryan – Journal of Moral Education, 2021
This article documents the adaptation, piloting and validation of a measure of teachers' ethical sensitivity. To create the test, we modified a measure from dentistry drawing on literature in teacher professional ethics and drew on the expertise of professional ethics scholars and practitioners. Based on the results of Rasch analysis combined with…
Descriptors: Ethics, Moral Values, Scores, Teacher Education Programs
Wang, Yuqi; Ren, Wei – Language Learning Journal, 2022
L2 pragmatics have explored the effects of different factors on different aspects of learners' pragmatic performance, but often not simultaneously. In addition, syntactic complexity is rarely examined in L2 pragmatics. This cross-sectional study aimed to conduct a multidimensional analysis to explore the effects of proficiency and study-abroad…
Descriptors: Pragmatics, Second Language Learning, Second Language Instruction, English (Second Language)
Li, Zijia; Gooden, Caroline; Toland, Michael D. – Journal of Early Intervention, 2019
This study provides preliminary evidence for reliability and validity of the Hawaii Early Learning Profile Strands 0-3 (HELP Strands 0-3), an assessment instrument for young children. First, the degree of interobserver agreement for a sample of representative HELP items was examined; results indicated that HELP scoring was dependable and…
Descriptors: Measures (Individuals), Psychometrics, Early Childhood Education, Test Reliability
Massar, Michelle M.; McIntosh, Kent; Mercer, Sterett H. – Remedial and Special Education, 2019
Assessing fidelity of implementation of school-based interventions is a critical factor in successful implementation and sustainability. The Tiered Fidelity Inventory (TFI) was developed as a comprehensive measure of all three tiers of School-Wide Positive Behavioral Interventions and Supports (SWPBIS) and is intended to measure the extent to…
Descriptors: Fidelity, Intervention, Program Implementation, Positive Behavior Supports
Banerjee, Rashida; Movahedazarhouligh, Sara; Millen, Kaitlyn; Luckner, John L. – Topics in Early Childhood Special Education, 2018
Valid and evidence-informed practices are critical to help young children with disabilities and their families with highly effective interventions and instruction to reach their potentials. Replication research is critical for appraising research and identifying evidence-based practices. The purpose of this study was to replicate the methods used…
Descriptors: Evidence, Early Childhood Education, Special Education, Replication (Evaluation)
Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018
This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…
Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students
van Rijn, Peter; Graf, Edith Aurora; Arieli-Attali, Meirav; Song, Yi – ETS Research Report Series, 2018
In this study, we explored the extent to which teachers agree on the ordering and separation of levels of two different learning progressions (LPs) in English language arts (ELA) and mathematics. In a panel meeting akin to a standard-setting procedure, we asked teachers to link the items and responses of summative educational assessments to LP…
Descriptors: Teacher Attitudes, Student Evaluation, Summative Evaluation, Language Arts
Musselwhite, Dorothy J.; Wesolowski, Brian C. – Journal of Research in Music Education, 2018
The purpose of this study was to evaluate the psychometric quality (i.e., validity and reliability) of a rating scale to assess pre-service teachers' lesson plan development in the context of secondary-level music performance classrooms. The research questions that guided this study include: (1) What items demonstrate acceptable model fit for the…
Descriptors: Psychometrics, Likert Scales, Preservice Teachers, Lesson Plans
Åhsberg, Elizabeth; Fahlström, Gunilla; Rönnbäck, Eva; Granberg, Ann-Kristin; Almborg, Ann-Helene – Research on Social Work Practice, 2017
Objective: To construct a needs assessment instrument for older people using a standardized terminology (International classification of functioning, disability, and health [ICF]) and assess its psychometrical properties. Method: An instrument was developed comprising questions to older people regarding their perceived care needs. The instrument's…
Descriptors: Caseworkers, Social Work, Older Adults, Needs Assessment
Edmunds, Sarah R.; Rozga, Agata; Li, Yin; Karp, Elizabeth A.; Ibanez, Lisa V.; Rehg, James M.; Stone, Wendy L. – Journal of Autism and Developmental Disorders, 2017
Children with autism spectrum disorder (ASD) show reduced gaze to social partners. Eye contact during live interactions is often measured using stationary cameras that capture various views of the child, but determining a child's precise gaze target within another's face is nearly impossible. This study compared eye gaze coding derived from…
Descriptors: Young Children, Autism, Pervasive Developmental Disorders, Eye Movements
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
McGough, David J. – AERA Online Paper Repository, 2017
This paper describes the implementation of an inter-rater reliability measure for assessing portfolio scores in a teacher education program. The reliability coefficient for the portfolio scores from completers of a newly revised program were compared with the reliability coefficient of the scores from a second set of reviewers who discussed the…
Descriptors: Interrater Reliability, Teacher Education Programs, Program Evaluation, Portfolio Assessment

Peer reviewed
Direct link
