ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	6

Descriptor

College Entrance Examinations	8
Interrater Reliability	8
Scoring	8
Graduate Study	5
Correlation	4
Essay Tests	4
Automation	3
Test Reliability	3
Writing Evaluation	3
Accuracy	2
English (Second Language)	2
Essays	2
Evaluation Methods	2
Higher Education	2
Holistic Evaluation	2
Arabic	1
Chinese	1
Computational Linguistics	1
Computer Assisted Testing	1
Computer Software	1
Conceptual Tempo	1
Data	1
Demography	1
Evaluation Criteria	1
Evaluation Research	1
More ▼

Source

ETS Research Report Series	5
Language Testing	1

Publication Type

Journal Articles	6
Reports - Research	6
Tests/Questionnaires	2
Reports - Evaluative	1
Reports - General	1
Speeches/Meeting Papers	1

Education Level

Higher Education	6
Postsecondary Education	6

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	8
Test of English as a Foreign…	2

What Works Clearinghouse Rating

Showing all 8 results Save | Export

Examining the Calibration Process for Raters of the "GRE"® General Test. ETS GRE® Board Research Report. GRE®-19-01. Research Report Series. ETS RR-19-09

Peer reviewed
PDF on ERIC

Download full text

Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019

One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…

Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability

Does the Time between Scoring Sessions Impact Scoring Accuracy? An Evaluation of Constructed-Response Essay Responses on the "GRE"® General Test. Research Report. ETS RR-18-31

Peer reviewed
PDF on ERIC

Download full text

Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018

This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…

Descriptors: Intervals, Scoring, Accuracy, Essay Tests

The Impact of Sampling Approach on Population Invariance in Automated Scoring of Essays. Research Report. ETS RR-13-18

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo – ETS Research Report Series, 2013

Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…

Descriptors: Automation, Scoring, Essay Tests, Sampling

Scoring with the Computer: Alternative Procedures for Improving the Reliability of Holistic Essay Scoring

Peer reviewed

Direct link

Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013

Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…

Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests

Length of Textual Response as a Construct-Irrelevant Response Strategy: The Case of Shell Language. Research Report. ETS RR-13-07

Peer reviewed
PDF on ERIC

Download full text

Bejar, Isaac I.; VanWinkle, Waverely; Madnani, Nitin; Lewis, William; Steier, Michael – ETS Research Report Series, 2013

The paper applies a natural language computational tool to study a potential construct-irrelevant response strategy, namely the use of "shell language." Although the study is motivated by the impending increase in the volume of scoring of students responses from assessments to be developed in response to the Race to the Top initiative,…

Descriptors: Responses, Language Usage, Natural Language Processing, Computational Linguistics

Evaluation of the "e-rater"® Scoring Engine for the "GRE"® Issue and Argument Prompts. Research Report. ETS RR-12-02

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012

Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…

Descriptors: Scoring, Test Scoring Machines, Automation, Models

Relationships between Direct and Indirect Measures of Writing Ability.

Download full text

Carlson, Sybil B.; Camp, Roberta – 1985

This paper reports on Educational Testing Service research studies investigating the parameters critical to reliability and validity in both the direct and indirect writing ability assessment of higher education applicants. The studies involved: (1) formulating an operational definition of writing competence; (2) designing and pretesting writing…

Descriptors: College Entrance Examinations, Computer Assisted Testing, English (Second Language), Essay Tests

Relationship of Admission Test Scores to Writing Performance of Native and Nonnative Speakers of English.

Download full text

Carlson, Sybil B.; And Others – 1985

Four writing samples were obtained from 638 foreign college applicants who represented three major foreign language groups (Arabic, Chinese, and Spanish), and from 60 native English speakers. All four were scored holistically, two were also scored for sentence-level and discourse-level skills, and some were scored by the Writer's Workbench…

Descriptors: Arabic, Chinese, College Entrance Examinations, Computer Software

Carlson, Sybil B.	2
Steier, Michael	2
Wendler, Cathy	2
Arslan, Burcu	1
Attali, Yigal	1
Bejar, Isaac I.	1
Bridgeman, Brent	1
Camp, Roberta	1
Cline, Frederick	1
Davey, Tim	1
Finn, Bridgid	1
Glazer, Nancy	1
Lewis, Will	1
Lewis, William	1
Madnani, Nitin	1
Ramineni, Chaitanya	1
Ricker-Pedley, Kathryn L.	1
Trapani, Catherine S.	1
VanWinkle, Waverely	1
Williamson, David M.	1
Zhang, Mo	1
More ▼