Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017
Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…
Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests
Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan – Journal of Educational Measurement, 2014
Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…
Descriptors: Item Response Theory, Interrater Reliability, Models, Correlation
Li, Shuai; Taguchi, Naoko; Xiao, Feng – Language Assessment Quarterly, 2019
Adopting Linacre's guidelines for evaluating rating scale effectiveness, we examined whether and how a six-point rating scale functioned differently across raters, speech acts, and second language (L2) proficiency levels. We developed a 12-item Computerized Oral Discourse Completion Task (CODCT) for assessing the production of requests, refusals,…
Descriptors: Speech Acts, Rating Scales, Guidelines, Evaluators
van der Scheer, Emmelien A.; Bijlsma, Hannah J. E.; Glas, Cees A. W. – School Effectiveness and School Improvement, 2019
A Bayesian IRT-model approach was used to investigate the validity and reliability of student perceptions of teaching quality. Furthermore, the student perceptions were compared with ratings of teaching quality by external observers. Grade 4 students (n = 675) filled out a questionnaire that was used to measure their opinions about the lessons of…
Descriptors: Student Attitudes, Validity, Interrater Reliability, Correlation
Harrison, George M. – Journal of Educational Measurement, 2015
The credibility of standard-setting cut scores depends in part on two sources of consistency evidence: intrajudge and interjudge consistency. Although intrajudge consistency feedback has often been provided to Angoff judges in practice, more evidence is needed to determine whether it achieves its intended effect. In this randomized experiment with…
Descriptors: Interrater Reliability, Standard Setting (Scoring), Cutting Scores, Feedback (Response)
Bruhn, Allison; Barron, Sheila; Fernando, Josephine; Balint-Langel, Kinga – Journal of Positive Behavior Interventions, 2018
Direct behavior ratings have been identified as a practical and feasible alternative to direct observation of behavior for monitoring behavioral progress. Despite the evidence of usability, there have been calls for further examination of direct behavior ratings using different behaviors and scales. To this end, we examined the ratings of…
Descriptors: Positive Behavior Supports, Behavior Rating Scales, Observation, Elementary School Students
Cornell, Heidi R.; Lin, Tiffany Ting; Anderson, Jeffrey Alvin – Journal of Occupational Therapy, Schools & Early Intervention, 2018
The results are presented from a systematic review of the literature that examined findings of published studies about play-based interventions for children and youth with ADHD. Guided by the research question, "What is the current status of evidence for using play-based interventions to improve outcomes for students with ADHD?," this…
Descriptors: Play, Intervention, Occupational Therapy, Attention Deficit Hyperactivity Disorder
From Aggregation to Interpretation: How Assessors Judge Complex Data in a Competency-Based Portfolio
Oudkerk Pool, Andrea; Govaerts, Marjan J. B.; Jaarsma, Debbie A. D. C.; Driessen, Erik W. – Advances in Health Sciences Education, 2018
While portfolios are increasingly used to assess competence, the validity of such portfolio-based assessments has hitherto remained unconfirmed. The purpose of the present research is therefore to further our understanding of how assessors form judgments when interpreting the complex data included in a competency-based portfolio. Eighteen…
Descriptors: Undergraduate Students, Medical Students, Medical Education, Competency Based Education
Lindelauf, Joanne; Reupert, Andrea; Jacobs, Kate E. – Journal of Psychologists and Counsellors in Schools, 2018
This study investigated how teachers who support children with learning difficulties utilise psychologists' reports in their teaching practice. Previous research has examined teachers' preferences for how reports should be written, rather than how they might be used. Semi-structured, qualitative interviews with 12 teachers (seven primary, four…
Descriptors: Learning Disabilities, Psychoeducational Methods, Information Utilization, Educational Practices
Derakhshan, Ali – International Journal of Instruction, 2018
The present study sought to investigate the effects of Summary Writing (SW), Picture Writing (PW), and Topic Writing (TW) tasks on the accuracy and complexity of Iranian intermediate EFL learners' writing performance. To this end, of 61 students majoring English Literature in Golestan University, Gorgan, Iran, 43 (10 males, 33 females) of them…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Writing Skills
Ianes, D.; Cappello, S.; Demo, H. – European Journal of Special Needs Education, 2017
Student voice has become increasingly important in educational research at an international level. Research in Italy on school integration of students with disabilities has almost entirely left behind student voice. The very few researches based on student voice suggest that there is a mismatch between student and teacher voices when faced with…
Descriptors: Foreign Countries, Teacher Attitudes, Student Attitudes, Comparative Analysis
Kim, Kerry J.; Meir, Eli; Pope, Denise S.; Wendel, Daniel – Journal of Educational Data Mining, 2017
Computerized classification of student answers offers the possibility of instant feedback and improved learning. Open response (OR) questions provide greater insight into student thinking and understanding than more constrained multiple choice (MC) questions, but development of automated classifiers is more difficult, often requiring training a…
Descriptors: Classification, Computer Assisted Testing, Multiple Choice Tests, Test Format
Biasutti, Michele – Technology, Pedagogy and Education, 2017
The current study describes the development of a content analysis coding scheme to examine transcripts of online asynchronous discussion groups in higher education. The theoretical framework comprises the theories regarding knowledge construction in computer-supported collaborative learning (CSCL) based on a sociocultural perspective. The coding…
Descriptors: Asynchronous Communication, Computer Mediated Communication, Content Analysis, Coding
Han, Jiantao; Long, Haiying; Pang, Weiguo – Creativity Research Journal, 2017
This study reported 2 experiments that studied the effect of perspective taking on assessment of creative products by using human raters. Forty responses of 2 alternative uses tasks (AUTs) and 15 alien stories generated by 6th-grade students were used as assessment materials. Undergraduate students as the novice raters assessed the products under…
Descriptors: Perspective Taking, Creativity, Undergraduate Students, Psychology
Ballard, Laura – ProQuest LLC, 2017
Rater scoring has an impact on writing test reliability and validity. Thus, there has been a continued call for researchers to investigate issues related to rating (Crusan, 2015). Investigating the scoring process and understanding how raters arrive at particular scores are critical "because the score is ultimately what will be used in making…
Descriptors: Evaluators, Schemata (Cognition), Eye Movements, Scoring Rubrics

Peer reviewed
Direct link
