Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…
Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials
Domínguez, César; Jaime, Arturo; García-Izquierdo, Francisco José; Olarte, Juan José – ACM Transactions on Computing Education, 2020
A capstone project is an extensive learning experience traditionally developed during a student's final academic year. Assessing such a complex assignment involves several challenges and is usually based upon the evaluations of at least two different people: the capstone project advisor, and one or more other assessors. Quantitative studies…
Descriptors: Computer Science Education, Capstone Experiences, Student Evaluation, Student Projects
Humphry, Stephen Mark; Heldsinger, Sandy – Journal of Educational Measurement, 2019
To capitalize on professional expertise in educational assessment, it is desirable to develop and test methods of rater-mediated assessment that enable classroom teachers to make reliable and informative judgments. Accordingly, this article investigates the reliability of a two-stage method used by classroom teachers to assess primary school…
Descriptors: Essays, Elementary School Students, Writing (Composition), Writing Evaluation
Roduta Roberts, Mary; Alves, Cecilia Brito; Werther, Karin; Bahry, Louise M. – Journal of Psychoeducational Assessment, 2019
The purpose of this study was to examine the reliability and sources of score variation from a performance assessment of practice competencies within an occupational therapy program. Data from 99 students who participated in a practical exam were examined. A generalizability analysis of analytic, total, and overall holistic scores was completed…
Descriptors: Performance Based Assessment, Test Reliability, Scores, Occupational Therapy
Esbensen, Anna J.; Hoffman, Emily K.; Shaffer, Rebecca; Chen, Elizabeth; Patel, Lina; Jacola, Lisa – American Journal on Intellectual and Developmental Disabilities, 2019
The current study evaluates the psychometric properties of the Behavior Rating Inventory of Executive Function (BRIEF) with children with Down syndrome. Caregivers of 84 children with Down syndrome rated their child's behavior with the BRIEF. Teacher ratings were obtained for 57 children. About 40% of children with Down syndrome were reported by…
Descriptors: Executive Function, Children, Down Syndrome, Behavior Rating Scales
van Daal, Tine; Lesterhuis, Marije; Coertjens, Liesje; Donche, Vincent; De Maeyer, Sven – Assessment in Education: Principles, Policy & Practice, 2019
Recently, comparative judgement has been introduced as an alternative method for scoring essays. Although this method is promising in terms of obtaining reliable scores, empirical evidence concerning its validity is lacking. The current study examines implications resulting from two critical assumptions underpinning the use of comparative…
Descriptors: Academic Discourse, Validity, Writing Evaluation, Value Judgment
Stager, Sheila V.; Gupta, Simran; Amdur, Richard; Bielamowicz, Steven A. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: The purpose of this study was to use objective measures of glottal gap, bowing, and supraglottic compression from selected images of laryngoscopic examinations from adults over 60 years of age with voice complaints and signs of aging to test current hypotheses on whether degree of severity impacts treatment recommendations and potential…
Descriptors: Older Adults, Patients, Aging (Individuals), Voice Disorders
Saritas Akyol, Seyhan; Karakaya, Ismail – Eurasian Journal of Educational Research, 2021
Purpose: To assess students' problem-solving skills, this study aims to investigate the consistency between self- and peer-ratings in consideration of the teachers' ratings in the process. Method: This study was a descriptive study which examines the mathematical problem-solving skills with the MFRM model concerning self-, peer- and teachers'…
Descriptors: Problem Solving, Item Response Theory, Self Evaluation (Individuals), Peer Evaluation
Doosti, Mehdi; Ahmadi Safa, Mohammad – International Journal of Language Testing, 2021
This study examined the effect of rater training on promoting inter-rater reliability in oral language assessment. It also investigated whether rater training and the consideration of the examinees' expectations by the examiners have any effect on test-takers' perceptions of being fairly evaluated. To this end, four raters scored 31 Iranian…
Descriptors: Oral Language, Language Tests, Interrater Reliability, Training
Dillon, Emily; Holingue, Calliope; Herman, Dana; Landa, Rebecca J. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: Social communication or pragmatic skills are continuously distributed in the general population. Impairment in these skills is associated with two clinical disorders, autism spectrum disorder (ASD) and social (pragmatic) communication disorder. Such impairment can impact a child's peer acceptance, school performance, and current and later…
Descriptors: Psychometrics, Pragmatics, Rating Scales, Elementary School Students
Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021
This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…
Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation
Al-Salmani, Fatema; Thacker, Beth – Physical Review Physics Education Research, 2021
We designed a rubric to assess free-response exam problems in order to compare thinking skills evidenced in exams in classes taught by different pedagogies. The rubric was designed based on Bloom's taxonomy and then used to code exam problems. We have analyzed historical and recent exam problems in both algebra-based and calculus-based exams. In…
Descriptors: Inquiry, Thinking Skills, Scoring Rubrics, Algebra
Kaharu, Sarintan N.; Mansyur, Jusman – Pegem Journal of Education and Instruction, 2021
This study aims to develop a test that can be used to explore mental models and representation patterns of objects in liquid fluid. The test developed by adapting the Reeves's Development Model was carried out in several stages, namely: determining the orientation and test segments; initial survey; preparation of the initial draft; try out;…
Descriptors: Test Construction, Schemata (Cognition), Scientific Concepts, Water
Shin, Sangeun; Park, HyunJu; Hill, Katya – Journal of Speech, Language, and Hearing Research, 2021
Purpose: This study is aimed to identify the high-frequency vocabulary (HFV), otherwise termed "core vocabulary" for adults with complex communication needs. Method: Three major characteristics of the HFV--a relatively small number of different words (NDW), a relatively high word frequency, and a high word commonality across…
Descriptors: Word Frequency, Vocabulary Skills, Adults, Age Differences
Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018
As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…
Descriptors: Automation, Scoring, Speech Tests, Language Tests

Peer reviewed
Direct link
