ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	13

Descriptor

Interrater Reliability	19
Scoring Formulas	19
Evaluation Methods	7
Test Reliability	7
Writing Evaluation	6
Scoring	5
Essay Tests	4
Grading	4
Higher Education	4
Scoring Rubrics	4
Test Scoring Machines	4
Accuracy	3
Comparative Analysis	3
Computer Assisted Testing	3
Correlation	3
Error of Measurement	3
Essays	3
Rating Scales	3
Test Validity	3
Achievement Rating	2
Alternative Assessment	2
Automation	2
College Entrance Examinations	2
College Second Language…	2
Comparative Testing	2
More ▼

Source

Accounting Education	1
Advances in Language and…	1
Applied Measurement in…	1
Assessment in Education:…	1
Australian Educational…	1
ELT Journal	1
ETS Research Report Series	1
Educational Leadership	1
Educational Sciences: Theory…	1
Journal of Technology,…	1
Measurement and Evaluation in…	1
Personnel Psychology	1
Practical Assessment,…	1
Teaching and Learning…	1
West Virginia Department of…	1
Working Papers in TESOL &…	1
More ▼

Publication Type

Reports - Research	16
Journal Articles	15
Tests/Questionnaires	4
Numerical/Quantitative Data	1
Opinion Papers	1
Reports - Descriptive	1
Reports - Evaluative	1
Speeches/Meeting Papers	1

Education Level

Higher Education	4
Postsecondary Education	3
Adult Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 7	1
Grade 8	1

Audience

Location

Malaysia	1
New York (New York)	1
Turkey	1
West Virginia	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Rounding in Angoff Ratings

Peer reviewed
PDF on ERIC

Download full text

Wyse, Adam E. – Practical Assessment, Research & Evaluation, 2018

One common modification to the Angoff standard-setting method is to have panelists round their ratings to the nearest 0.05 or 0.10 instead of 0.01. Several reasons have been offered as to why it may make sense to have panelists round their ratings to the nearest 0.05 or 0.10. In this article, we examine one reason that has been suggested, which is…

Descriptors: Interrater Reliability, Evaluation Criteria, Scoring Formulas, Achievement Rating

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Processes and Procedures for Estimating Score Reliability and Precision

Peer reviewed

Direct link

Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017

Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…

Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests

Grading: Why You Should Trust Your Judgment

Direct link

Guskey, Thomas R.; Jung, Lee Ann – Educational Leadership, 2016

Many educators consider grades calculated from statistical algorithms more accurate, objective, and reliable than grades they calculate themselves. But in this research, the authors first asked teachers to use their professional judgment to choose a summary grade for hypothetical students. When the researchers compared the teachers' grade with the…

Descriptors: Grading, Computer Assisted Testing, Interrater Reliability, Grades (Scholastic)

A Study of the Use of Pairwise Comparison in the Context of Social Online Moderation

Peer reviewed

Direct link

Tarricone, Pina; Newhouse, C. Paul – Australian Educational Researcher, 2016

Traditional moderation of student assessments is often carried out with groups of teachers working face-to-face in a specified location making judgements concerning the quality of representations of achievement. This traditional model has relied little on modern information communications technologies and has been logistically challenging. We…

Descriptors: Visual Arts, Art Education, Art Materials, Alternative Assessment

Investigation of Coefficient of Individual Agreement in Terms of Sample Size, Random and Monotone Missing Ratio, and Number of Repeated Measures

Peer reviewed
PDF on ERIC

Download full text

Temel, Gülhan Orekici; Erdogan, Semra; Selvi, Hüseyin; Kaya, Irem Ersöz – Educational Sciences: Theory and Practice, 2016

Studies based on longitudinal data focus on the change and development of the situation being investigated and allow for examining cases regarding education, individual development, cultural change, and socioeconomic improvement in time. However, as these studies require taking repeated measures in different time periods, they may include various…

Descriptors: Investigations, Sample Size, Longitudinal Studies, Interrater Reliability

Assessment in Higher Education: The Potential for a Community of Practice to Improve Inter-Marker Reliability

Peer reviewed

Direct link

Herbert, Ian P.; Joyce, John; Hassall, Trevor – Accounting Education, 2014

The design, delivery and assessment of a complete educational scheme, such as a degree programme or a professional qualification course, is a complex matter. Maintaining alignment between the stated aims of the curriculum and the scoring of student achievement is an overarching concern. The potential for drift across individual aspects of an…

Descriptors: Higher Education, Student Evaluation, Communities of Practice, Interrater Reliability

The Effects of Visual Input on Scoring a Speaking Achievement Test

Peer reviewed
PDF on ERIC

Download full text

Beltrán, Jorge – Working Papers in TESOL & Applied Linguistics, 2016

In the assessment of aural skills of second language learners, the study of the inclusion of visual stimuli has almost exclusively been conducted in the context of listening assessment. While the inclusion of contextual information in test input has been advocated for by numerous researchers (Ockey, 2010), little has been said regarding the…

Descriptors: Achievement Tests, Speech Skills, Speech Tests, Second Language Learning

Findings from the 2012 West Virginia Online Writing Scoring Comparability Study

Download full text

Hixson, Nate; Rhudy, Vaughn – West Virginia Department of Education, 2013

Student responses to the West Virginia Educational Standards Test (WESTEST) 2 Online Writing Assessment are scored by a computer-scoring engine. The scoring method is not widely understood among educators, and there exists a misperception that it is not comparable to hand scoring. To address these issues, the West Virginia Department of Education…

Descriptors: Scoring Formulas, Scoring Rubrics, Interrater Reliability, Test Scoring Machines

Grade Expectations: More than Meets the Eye

Peer reviewed
PDF on ERIC

Download full text

Samad, Arshad Abd; bt Ahmad, Zamzam – Advances in Language and Literary Studies, 2012

Raimes (1983) has identified nine components necessary to produce a piece of writing that is clear, fluent and effective. These are also the aspects that are considered when assessing writing. The common practice is to have raters score the essays and they are provided with a rating scale for this purpose. A training and practice session is also…

Descriptors: Writing Evaluation, Writing Achievement, Interrater Reliability, Scoring Formulas

Evaluation of the "e-rater"® Scoring Engine for the "GRE"® Issue and Argument Prompts. Research Report. ETS RR-12-02

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012

Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…

Descriptors: Scoring, Test Scoring Machines, Automation, Models

Effects of Marking Method and Rater Experience on ESL Essay Scores and Rater Performance

Peer reviewed

Direct link

Barkaoui, Khaled – Assessment in Education: Principles, Policy & Practice, 2011

This study examined the effects of marking method and rater experience on ESL (English as a Second Language) essay test scores and rater performance. Each of 31 novice and 29 experienced raters rated a sample of ESL essays both holistically and analytically. Essay scores were analysed using a multi-faceted Rasch model to compare test-takers'…

Descriptors: Writing Evaluation, Writing Tests, Essay Tests, Interrater Reliability

An Evaluation of Alternate Scoring Methods for the Mixed Standard Scale.

Peer reviewed

Hughes, Garry L.; Prien, Erich P. – Personnel Psychology, 1986

Investigated psychometric properties of three methods of scoring a Mixed Standard Scale performance evaluation: a patterned procedure, simple nonpatterned scoring procedure and procedure assigning differential weights to statements on the basis of scale values provided by subject matter experts. Found no differences in the score distribution…

Descriptors: Evaluation Methods, Interrater Reliability, Scoring, Scoring Formulas

Night of the Living Portfolios 2.

Berger, Peter N. – Teaching and Learning Literature with Children and Young Adults, 1997

Discusses problems with scoring reliability of the Vermont Education Department's writing portfolio test, particularly the difficulties teachers face in agreeing upon scoring criteria. (PA)

Descriptors: Elementary Secondary Education, Interrater Reliability, Portfolio Assessment, Portfolios (Background Materials)

Constructing Rating Scales for Second Language Tests.

Peer reviewed

Upshur, John A.; Turner, Carolyn E. – ELT Journal, 1995

Reviews the place of rating scales in second-language measurement and summarizes some of the problems associated with them. Standard and alternative scales were studied. High agreement among raters can be achieved even under conditions not favorable to high interrater reliability. The full range of score categories are effectively utilized. (17…

Descriptors: Evaluation Problems, Interrater Reliability, Language Tests, Measurement Techniques

Previous Page | Next Page »

Pages: 1 | 2

Aghbar, Ali-Asghar	1
Bardhoshi, Gerta	1
Barkaoui, Khaled	1
Barter, Alice K.	1
Beltrán, Jorge	1
Ben-Simon, Anat	1
Bennett, Randy Elliott	1
Berger, Peter N.	1
Bridgeman, Brent	1
Cohen, Allan	1
Davey, Tim	1
Erdogan, Semra	1
Erford, Bradley T.	1
Guskey, Thomas R.	1
Hassall, Trevor	1
Herbert, Ian P.	1
Hixson, Nate	1
Hughes, Garry L.	1
Joyce, John	1
Jung, Lee Ann	1
Kaya, Irem Ersöz	1
Littlefield, John H.	1
Newhouse, C. Paul	1
Prien, Erich P.	1
More ▼