Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 2 |
| Since 2007 (last 20 years) | 11 |
Descriptor
| Scoring | 21 |
| Computer Software | 18 |
| Computer Assisted Testing | 10 |
| Essays | 6 |
| Foreign Countries | 6 |
| Writing Evaluation | 5 |
| Comparative Analysis | 4 |
| Computer Software Evaluation | 4 |
| Correlation | 4 |
| Essay Tests | 4 |
| Evaluation Methods | 4 |
| More ▼ | |
Source
Author
| O'Neil, Harold F., Jr. | 2 |
| Schacter, John | 2 |
| Abrash, Victor | 1 |
| Bratt, Harry | 1 |
| Chung, Gregory K. W. K. | 1 |
| Coniam, David | 1 |
| Drasgow, Fritz | 1 |
| Franco, Horacio | 1 |
| Garrigues, Mylene | 1 |
| Gentile, Claudia | 1 |
| Grimes, Douglas | 1 |
| More ▼ | |
Publication Type
| Reports - Evaluative | 21 |
| Journal Articles | 16 |
| Numerical/Quantitative Data | 2 |
| Reports - Descriptive | 1 |
| Speeches/Meeting Papers | 1 |
Education Level
| Elementary Secondary Education | 2 |
| Secondary Education | 2 |
| Elementary Education | 1 |
| Grade 3 | 1 |
| Grade 4 | 1 |
| Grade 5 | 1 |
| Higher Education | 1 |
| Middle Schools | 1 |
Audience
Location
| California | 1 |
| Canada | 1 |
| Hong Kong | 1 |
| Japan | 1 |
| Malaysia | 1 |
| South Korea | 1 |
| Turkey | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| National Assessment of… | 2 |
| Test of English as a Foreign… | 2 |
What Works Clearinghouse Rating
Raykov, Tenko – Measurement: Interdisciplinary Research and Perspectives, 2023
This software review discusses the capabilities of Stata to conduct item response theory modeling. The commands needed for fitting the popular one-, two-, and three-parameter logistic models are initially discussed. The procedure for testing the discrimination parameter equality in the one-parameter model is then outlined. The commands for fitting…
Descriptors: Item Response Theory, Models, Comparative Analysis, Item Analysis
Tahereh Firoozi; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2023
The proliferation of large language models represents a paradigm shift in the landscape of automated essay scoring (AES) systems, fundamentally elevating their accuracy and efficacy. This study presents an extensive examination of large language models, with a particular emphasis on the transformative influence of transformer-based models, such as…
Descriptors: Turkish, Writing Evaluation, Essays, Accuracy
McCurry, Doug – Assessing Writing, 2010
This article considers the claim that machine scoring of writing test responses agrees with human readers as much as humans agree with other humans. These claims about the reliability of machine scoring of writing are usually based on specific and constrained writing tasks, and there is reason for asking whether machine scoring of writing requires…
Descriptors: Writing Tests, Scoring, Interrater Reliability, Computer Assisted Testing
Nehm, Ross H.; Haertig, Hendrik – Journal of Science Education and Technology, 2012
Our study examines the efficacy of Computer Assisted Scoring (CAS) of open-response text relative to expert human scoring within the complex domain of evolutionary biology. Specifically, we explored whether CAS can diagnose the explanatory elements (or Key Concepts) that comprise undergraduate students' explanatory models of natural selection with…
Descriptors: Evolution, Undergraduate Students, Interrater Reliability, Computers
Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – Applied Linguistics, 2010
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multi-trait) rating dimensions and their relationships to holistic scores and "e-rater"[R] essay feature variables in the context of the TOEFL[R] computer-based test (TOEFL CBT) writing assessment. Data analyzed in the study were holistic…
Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays
Nakayama, Minoru; Yamamoto, Hiroh; Santiago, Rowena – Electronic Journal of e-Learning, 2010
e-Learning has some restrictions on how learning performance is assessed. Online testing is usually in the form of multiple-choice questions, without any essay type of learning assessment. Major reasons for employing multiple-choice tasks in e-learning include ease of implementation and ease of managing learner's responses. To address this…
Descriptors: Electronic Learning, Testing, Essay Tests, Online Courses
Lau, Paul Ngee Kiong; Lau, Sie Hoe; Hong, Kian Sam; Usop, Hasbee – Educational Technology & Society, 2011
The number right (NR) method, in which students pick one option as the answer, is the conventional method for scoring multiple-choice tests that is heavily criticized for encouraging students to guess and failing to credit partial knowledge. In addition, computer technology is increasingly used in classroom assessment. This paper investigates the…
Descriptors: Guessing (Tests), Multiple Choice Tests, Computers, Scoring
Franco, Horacio; Bratt, Harry; Rossier, Romain; Rao Gadde, Venkata; Shriberg, Elizabeth; Abrash, Victor; Precoda, Kristin – Language Testing, 2010
SRI International's EduSpeak[R] system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to…
Descriptors: Feedback (Response), Sentences, Oral Language, Predictor Variables
Grimes, Douglas; Warschauer, Mark – Journal of Technology, Learning, and Assessment, 2010
Automated writing evaluation (AWE) software uses artificial intelligence (AI) to score student essays and support revision. We studied how an AWE program called MY Access![R] was used in eight middle schools in Southern California over a three-year period. Although many teachers and students considered automated scoring unreliable, and teachers'…
Descriptors: Automation, Writing Evaluation, Essays, Artificial Intelligence
Kim, Seong-in; Hameed, Ibrahim A. – Art Therapy: Journal of the American Art Therapy Association, 2009
For mental health professionals, art assessment is a useful tool for patient evaluation and diagnosis. Consideration of various color-related elements is important in art assessment. This correlational study introduces the concept of variety of color as a new color-related element of an artwork. This term represents a comprehensive use of color,…
Descriptors: Mental Health Workers, Essays, Scoring, Visual Stimuli
Coniam, David – ReCALL, 2009
This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…
Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability
Peer reviewedHarasym, Peter H.; And Others – Journal of Educational Computing Research, 1993
Discussion of the use of human markers to mark responses on write-in questions focuses on a study that determined the feasibility of using a computer program to mark write-in responses for the Medical Council of Canada Qualifying Examination. The computer performance was compared with that of physician markers. (seven references) (LRW)
Descriptors: Comparative Analysis, Computer Assisted Testing, Computer Software Development, Computer Software Evaluation
Kaplan, Randy M.; And Others – 1995
The increased use of constructed-response items, like essays, creates a need for tools to score these responses automatically in part or as a whole. This study explores one approach to analyzing essay-length natural language constructed-responses. A decision model for scoring essays was developed and evaluated. The decision model uses…
Descriptors: Computer Software, Constructed Response, Essay Tests, Grammar
Peer reviewedDrasgow, Fritz; And Others – Applied Psychological Measurement, 1995
This study examined how well current software implementations of four polytomous item response theory models fit several multiple-choice tests. The main conclusion is that fitting polytomous item response models to multiple-choice item responses is more complex than fitting the three-parameter logistic model to dichotomously scored responses. (SLD)
Descriptors: Computer Software, Goodness of Fit, Item Response Theory, Models
Peer reviewedPage, Ellis Batten – Journal of Experimental Education, 1994
National Assessment of Educational Progress writing sample essays from 1988 and 1990 (495 and 599 essays) were subjected to computerized grading and human ratings. Cross-validation suggests that computer scoring is superior to a two-judge panel, a finding encouraging for large programs of essay evaluation. (SLD)
Descriptors: Computer Assisted Testing, Computer Software, Essays, Evaluation Methods
Previous Page | Next Page ยป
Pages: 1 | 2
Direct link
