Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 4 |
| Since 2017 (last 10 years) | 9 |
| Since 2007 (last 20 years) | 16 |
Descriptor
| Computer Assisted Testing | 19 |
| Evaluators | 19 |
| Interrater Reliability | 19 |
| Second Language Learning | 12 |
| English (Second Language) | 11 |
| Language Tests | 10 |
| Scoring | 10 |
| Comparative Analysis | 7 |
| Correlation | 7 |
| Foreign Countries | 7 |
| Oral Language | 7 |
| More ▼ | |
Source
Author
| Wolfe, Edward W. | 2 |
| Amanda Huee-Ping Wong | 1 |
| Bejar, Isaac I. | 1 |
| Bell, John F. | 1 |
| Breyer, F. Jay | 1 |
| Casabianca, Jodi M. | 1 |
| Clevinger, Amanda | 1 |
| Coniam, David | 1 |
| Crossley, Scott | 1 |
| Davis, Larry | 1 |
| Doewes, Afrizal | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 16 |
| Reports - Research | 16 |
| Speeches/Meeting Papers | 3 |
| Tests/Questionnaires | 3 |
| Reports - Evaluative | 2 |
| Information Analyses | 1 |
Education Level
| Higher Education | 5 |
| Postsecondary Education | 5 |
| High Schools | 2 |
| Secondary Education | 2 |
| Grade 11 | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 8 |
What Works Clearinghouse Rating
Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023
Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…
Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy
Georgios Zacharis; Stamatios Papadakis – Educational Process: International Journal, 2025
Background/purpose: Generative artificial intelligence (GenAI) is often promoted as a transformative tool for assessment, yet evidence of its validity compared to human raters remains limited. This study examined whether an AI-based rater could be used interchangeably with trained faculty in scoring complex coursework. Materials/methods:…
Descriptors: Artificial Intelligence, Technology Uses in Education, Computer Assisted Testing, Grading
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018
Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…
Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring
Rupp, André A.; Casabianca, Jodi M.; Krüger, Maleika; Keller, Stefan; Köller, Olaf – ETS Research Report Series, 2019
In this research report, we describe the design and empirical findings for a large-scale study of essay writing ability with approximately 2,500 high school students in Germany and Switzerland on the basis of 2 tasks with 2 associated prompts, each from a standardized writing assessment whose scoring involved both human and automated components.…
Descriptors: Automation, Foreign Countries, English (Second Language), Language Tests
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Li, Shuai; Taguchi, Naoko; Xiao, Feng – Language Assessment Quarterly, 2019
Adopting Linacre's guidelines for evaluating rating scale effectiveness, we examined whether and how a six-point rating scale functioned differently across raters, speech acts, and second language (L2) proficiency levels. We developed a 12-item Computerized Oral Discourse Completion Task (CODCT) for assessing the production of requests, refusals,…
Descriptors: Speech Acts, Rating Scales, Guidelines, Evaluators
Kang, Okim; Rubin, Don; Kermad, Alyssa – Language Testing, 2019
As a result of the fact that judgments of non-native speech are closely tied to social biases, oral proficiency ratings are susceptible to error because of rater background and social attitudes. In the present study we seek first to estimate the variance attributable to rater background and attitudinal variables on novice raters' assessments of L2…
Descriptors: Evaluators, Second Language Learning, Language Tests, English (Second Language)
Davis, Larry – Language Testing, 2016
Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…
Descriptors: Evaluators, Oral Language, Scores, Language Tests
Crossley, Scott; Clevinger, Amanda; Kim, YouJin – Language Assessment Quarterly, 2014
There has been a growing interest in the use of integrated tasks in the field of second language testing to enhance the authenticity of language tests. However, the role of text integration in test takers' performance has not been widely investigated. The purpose of the current study is to examine the effects of text-based relational (i.e.,…
Descriptors: Language Proficiency, Connected Discourse, Language Tests, English (Second Language)
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Jamieson, Joan; Poonpon, Kornwipa – ETS Research Report Series, 2013
Research and development of a new type of scoring rubric for the integrated speaking tasks of "TOEFL iBT"® are described. These "analytic rating guides" could be helpful if tasks modeled after those in TOEFL iBT were used for formative assessment, a purpose which is different from TOEFL iBT's primary use for admission…
Descriptors: Oral Language, Language Proficiency, Scaling, Scores
Marking Essays on Screen: An Investigation into the Reliability of Marking Extended Subjective Texts
Johnson, Martin; Nadas, Rita; Bell, John F. – British Journal of Educational Technology, 2010
There is a growing body of research literature that considers how the mode of assessment, either computer-based or paper-based, might affect candidates' performances. Despite this, there is a fairly narrow literature that shifts the focus of attention to those making assessment judgements and which considers issues of assessor consistency when…
Descriptors: English Literature, Examiners, Evaluation Research, Evaluators
Coniam, David – Educational Research and Evaluation, 2009
This paper describes a study comparing paper-based marking (PBM) and onscreen marking (OSM) in Hong Kong utilising English language essay scripts drawn from the live 2007 Hong Kong Certificate of Education Examination (HKCEE) Year 11 English Language Writing Paper. In the study, 30 raters from the 2007 HKCEE Writing Paper marked on paper 100…
Descriptors: Student Attitudes, Foreign Countries, Essays, Comparative Analysis
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
