Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 8 |
| Since 2017 (last 10 years) | 16 |
| Since 2007 (last 20 years) | 26 |
Descriptor
Source
Author
| Ahmadi Safa, Mohammad | 1 |
| Ahmadi Shirazi, Masoumeh | 1 |
| Ahmadi, Alireza | 1 |
| Ahrari, Ramin | 1 |
| Ann Tai Choe | 1 |
| Archer, Jeff | 1 |
| Arnold, Voiza | 1 |
| Beh-Afarin, Seyed Reza | 1 |
| Berger, Cynthia M. | 1 |
| Bosch, Emma | 1 |
| Boser, Judith A. | 1 |
| More ▼ | |
Publication Type
| Tests/Questionnaires | 38 |
| Reports - Research | 32 |
| Journal Articles | 24 |
| Speeches/Meeting Papers | 6 |
| Reports - Descriptive | 3 |
| Books | 1 |
| Guides - General | 1 |
| Guides - Non-Classroom | 1 |
| Information Analyses | 1 |
| Reports - Evaluative | 1 |
Education Level
| Higher Education | 13 |
| Postsecondary Education | 13 |
| Elementary Education | 5 |
| Secondary Education | 4 |
| Elementary Secondary Education | 2 |
| Early Childhood Education | 1 |
| High Schools | 1 |
| Kindergarten | 1 |
| Primary Education | 1 |
Audience
| Administrators | 2 |
| Practitioners | 2 |
| Researchers | 1 |
| Teachers | 1 |
Location
| Iran | 4 |
| Japan | 2 |
| Australia | 1 |
| California | 1 |
| China | 1 |
| Hawaii | 1 |
| Illinois | 1 |
| Iran (Tehran) | 1 |
| Spain | 1 |
| Taiwan | 1 |
| Tennessee | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 2 |
| Flanders System of… | 1 |
| Flesch Kincaid Grade Level… | 1 |
| International English… | 1 |
| National Assessment of… | 1 |
| Praxis Series | 1 |
| edTPA (Teacher Performance… | 1 |
What Works Clearinghouse Rating
Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022
Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…
Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods
Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025
The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…
Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction
Zahn, Daniela; Canton, Ursula; Boyd, Victoria; Hamilton, Laura; Mamo, Josianne; McKay, Jane; Proudfoot, Linda; Telfer, Dickson; Williams, Kim; Wilson, Colin – Studies in Higher Education, 2021
Evaluating the impact of Academic Literacies teaching (Lea and Street [1998. "Student Writing in Higher Education: An Academic Literacies Approach." "Studies in Higher Education" 23 (2): 157-72. doi:10.1080/03075079812331380364]) is difficult, as it involves gauging whether writers: (1) gain better understanding of what…
Descriptors: Writing Evaluation, Evaluation Methods, Undergraduate Students, Foreign Countries
Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024
In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…
Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)
Hunter, Seth B. – Journal of Education Human Resources, 2023
Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability
Sayin, Ayfer; Sata, Mehmet – International Journal of Assessment Tools in Education, 2022
The aim of the present study was to examine Turkish teacher candidates' competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates…
Descriptors: Foreign Countries, Item Response Theory, Evaluators, Expertise
Doosti, Mehdi; Ahmadi Safa, Mohammad – International Journal of Language Testing, 2021
This study examined the effect of rater training on promoting inter-rater reliability in oral language assessment. It also investigated whether rater training and the consideration of the examinees' expectations by the examiners have any effect on test-takers' perceptions of being fairly evaluated. To this end, four raters scored 31 Iranian…
Descriptors: Oral Language, Language Tests, Interrater Reliability, Training
Romeo, Marina; Yepes-Baldó, Montserrat; González, Vicenta; Burset, Silvia; Martín, Carolina; Bosch, Emma – International Journal of Instruction, 2022
The assessment process in higher education considers four aspects: assessment agents, procedure, content, and scoring. In this study, we delve into the who. We analyze the role of transversal competence assessment agents in the framework of professional internships in university master's degree programs, comparing the suitability of their…
Descriptors: Internship Programs, Higher Education, Evaluators, Masters Programs
Karusoo-Musumeci, Ava; Pearce, Wendy M.; Donaghy, Michelle – Child Language Teaching and Therapy, 2022
Oral narrative assessments are important for diagnosis of language disorders in school-age children so scoring needs to be reliable and consistent. This study explored the impact of training on the variability of story grammar scores in children's oral narrative assessments scored by multiple raters. Fifty-one speech pathologists and 19 final-year…
Descriptors: Oral Language, Speech Evaluation, Language Impairments, Elementary School Students
Lyness, Scott A.; Peterson, Kent; Yates, Kenneth – Education Sciences, 2021
The Performance Assessment for California Teachers (PACT) is a high stakes summative assessment that was designed to measure pre-service teacher readiness. We examined the inter-rater reliability (IRR) of trained PACT evaluators who rated 19 candidates. As measured by Cohen's weighted kappa, the overall IRR estimate was 0.17 (poor strength of…
Descriptors: High Stakes Tests, Performance Based Assessment, Teacher Effectiveness, Academic Language
Wang, Qiao – Education and Information Technologies, 2022
This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring
Jeong, Heejeong – Language Testing in Asia, 2019
In writing assessment, finding a valid, reliable, and efficient scale is critical. Appropriate scales, increase rater reliability, and can also save time and money. This exploratory study compared the effects of a binary scale and an analytic scale across teacher raters and expert raters. The purpose of the study is to find out how different scale…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Ahmadi Shirazi, Masoumeh – SAGE Open, 2019
Threats to construct validity should be reduced to a minimum. If true, sources of bias, namely raters, items, tests as well as gender, age, race, language background, culture, and socio-economic status need to be spotted and removed. This study investigates raters' experience, language background, and the choice of essay prompt as potential…
Descriptors: Foreign Countries, Language Tests, Test Bias, Essay Tests
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Archer, Jeff; Cantrell, Steve; Holtzman, Steven L.; Joe, Jilliam N.; Tocci, Cynthia M.; Wood, Jess – Bill & Melinda Gates Foundation, 2016
In this book the authors explain how to build, and over time improve, the elements of an observation system that equips all observers to identify and develop effective teaching. It is based on the collective knowledge of key partners in the Measures of Effective Teaching (MET) project--which carried out one of the largest-ever studies of classroom…
Descriptors: Feedback (Response), Teacher Effectiveness, Observation, Teacher Evaluation

Peer reviewed
Direct link
