ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	7

Descriptor

Interrater Reliability	15
Scoring	15
Testing	15
Test Construction	8
Test Items	6
Test Validity	6
Foreign Countries	5
Student Evaluation	5
Test Reliability	5
Scores	4
Educational Assessment	3
English (Second Language)	3
Evaluation Criteria	3
Evaluation Methods	3
Evaluators	3
Language Tests	3
Test Use	3
Alternative Assessment	2
Engineering Education	2
Item Response Theory	2
Rating Scales	2
Scoring Rubrics	2
Second Language Instruction	2
Test Format	2
Test Results	2
More ▼

Source

Assessment and Evaluation in…	1
ETS Research Report Series	1
International Journal of…	1
International Journal of…	1
Journal of Communication…	1
Journal of Educational…	1
Modern Language Journal	1
New York State Education…	1
Online Submission	1
ProQuest LLC	1

Publication Type

Journal Articles	7
Reports - Research	6
Reports - Evaluative	3
Reports - Descriptive	2
Speeches/Meeting Papers	2
Dissertations/Theses -…	1
Guides - General	1
Guides - Non-Classroom	1
Information Analyses	1
Opinion Papers	1

Education Level

Higher Education	2
Postsecondary Education	2
Early Childhood Education	1
Elementary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Practitioners	1
Teachers	1

Location

Japan	1
New York	1
United Kingdom (England)	1
United Kingdom (London)	1

Laws, Policies, & Programs

Individuals with Disabilities…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

Clinical Evaluation of…	1
Raven Progressive Matrices	1
Strengths and Difficulties…	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Inter-Rater Reliability in Comprehensive Examination Scoring: The Case for Consistent and Collaborative Rater Training and Calibration

Download full text

Saenz, David Arron – Online Submission, 2023

There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…

Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics

Investigating Constructed-Response Scoring over Time: The Effects of Study Design on Trend Rescore Statistics. Research Report. ETS RR-22-15

Peer reviewed
PDF on ERIC

Download full text

Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022

When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…

Descriptors: Item Response Theory, Test Construction, Scoring, Testing

Scoring Stability in a Large-Scale Assessment Program: A Longitudinal Analysis of Leniency/Severity Effects

Peer reviewed

Direct link

Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019

Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…

Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation

ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations

Peer reviewed

Direct link

International Journal of Testing, 2019

These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…

Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage

Dynamic Assessment of Sentence Structure (DASS): Design and Evaluation of a Novel Procedure for the Assessment of Syntax in Children with Language Impairments

Peer reviewed

Direct link

Hasson, Natalie; Dodd, Barbara; Botting, Nicola – International Journal of Language & Communication Disorders, 2012

Background: Sentence construction and syntactic organization are known to be poor in children with specific language impairments (SLI), but little is known about the way in which children with SLI approach language tasks, and static standardized tests contribute little to the differentiation of skills within the population of children with…

Descriptors: Alternative Assessment, Sentence Structure, Syntax, Language Processing

Development and Testing of Assessment Instruments for Multidisciplinary Engineering Capstone Design Courses

Direct link

Gerlick, Robert Edward – ProQuest LLC, 2010

The research presented in this manuscript was focused on the development of assessments for engineering design outcomes. The primary goal was to support efforts by the Transferrable Integrated Design Engineering Education (TIDEE) consortium in developing assessment instruments for multidisciplinary engineering capstone courses. Research conducted…

Descriptors: Engineering Education, Student Evaluation, Formative Evaluation, Testing

New York State Alternate Assessment Technical Report, 2013-14

Download full text

New York State Education Department, 2014

This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. The purpose of this report is to document the technical aspects of the 2013-14 NYSAA.…

Descriptors: Alternative Assessment, Educational Assessment, State Departments of Education, Student Evaluation

A Specific Investigation of Relative Performance of Examination Markers.

Peer reviewed

Collier, Michael – Assessment and Evaluation in Higher Education, 1986

A study revealing wide variation in the grading of electronics engineering test items by different evaluators has implications for evaluator and test item selection, analysis and manipulation of grades, and the use of numerical methods of assessment. (MSE)

Descriptors: Electronics, Engineering Education, Evaluation Methods, Evaluators

A Method To Compare Rater Severity across Several Administrations.

Download full text

O'Neill, Thomas R.; Lunz, Mary E. – 1997

This paper illustrates a method to study rater severity across exam administrations. A multi-facet Rasch model defined the ratings as being dominated by four facets: examinee ability, rater severity, project difficulty, and task difficulty. Ten years of data from administrations of a histotechnology performance assessment were pooled and analyzed…

Descriptors: Ability, Comparative Analysis, Equated Scores, Interrater Reliability

Analytic Scales for Assessing Students' Expository and Narrative Writing Skills. CSE Resource Paper No. 5.

Download full text

Quellmalz, Edys S.; Burry, James – 1983

The Center for the Study of Evaluation's (CSE) expository and narrative rating scales have been developed to meet the need for instructionally relevant methods for assessing students' writing competence. Research indicates that large numbers of raters can be trained in the use of these scales and that, during training and independent rating, they…

Descriptors: Evaluation Criteria, Evaluators, Expository Writing, Holistic Evaluation

New Stuff in I/O (In-Baskets and Orals). The Development, Administration and Scoring of In-Baskets and Orals for the New York State Correction Captain Examination.

Download full text

Kaiser, Paul D.; Brull, Harry – 1994

The design, administration, scoring, and results of the 1993 New York State Correctional Captain Examination are described. The examination was administered to 405 candidates. As in previous Sergeant and Lieutenant examinations, candidates also completed latent image written simulation problems and open/closed book multiple choice test components.…

Descriptors: Competitive Selection, Correctional Rehabilitation, Decision Making, Educational Innovation

Reading Proficiency Assessment and the ILR/ACTFL Typology: A Reevaluation.

Peer reviewed

Edwards, Alison L. – Modern Language Journal, 1996

Examined the validity of the pragmatic approach to test difficulty put forward by Child (1987). This study investigated whether the Child discourse-type hierarchy predicts text difficulty for second-language readers. Results suggested that this hierarchy may provide a sound basis for developing foreign-language tests when it is applied by trained…

Descriptors: Adult Students, Analysis of Variance, French, Interrater Reliability

Language Test Construction and Evaluation.

Alderson, J. Charles; And Others – 1995

The guide is intended for teachers who must construct language tests and for other professionals who may need to construct, evaluate, or use the results of language tests. Most examples are drawn from the field of English-as-a-Second-Language instruction in the United Kingdom, but the principles and practices described may be applied to the…

Descriptors: Educational Trends, English (Second Language), Interrater Reliability, Language Tests

A Study of Raters' Scoring Tendency of Speaking Ability through Verbal Report Methods and Questionnaire Analysis.

Download full text

Nakamura, Yuji – Journal of Communication Studies, 1996

To find ways to improve rater reliability of a tape-mediated speaking test for Japanese university students of English as a Second Language, two studies gathered information on: how raters actually made their choices on rating sheets of students' speaking ability; determined what criteria teachers think they use and actually use in rating…

Descriptors: English (Second Language), Evaluation Criteria, Foreign Countries, Interrater Reliability

Towards Communicative Measurement of Writing: Where Are We Now?

Download full text

Salies, Tania Gastao – 1998

A discussion of the evaluation of writing, particularly in English as a Second Language, argues for a communicative approach reflecting the current approach to language teaching and learning. The movement toward more communication-oriented and more valid language testing is examined briefly, and direct assessment is chosen as the preferred format…

Descriptors: Communicative Competence (Languages), English (Second Language), Evaluation Criteria, Foreign Countries

Alderson, J. Charles	1
Botting, Nicola	1
Brull, Harry	1
Bunch, Michael B.	1
Burry, James	1
Collier, Michael	1
Dodd, Barbara	1
Donoghue, John R.	1
Edwards, Alison L.	1
Gerlick, Robert Edward	1
Hasson, Natalie	1
Hess, Melinda R.	1
Kaiser, Paul D.	1
Lunz, Mary E.	1
McClellan, Catherine A.	1
Nakamura, Yuji	1
O'Neill, Thomas R.	1
Palermo, Corey	1
Quellmalz, Edys S.	1
Ridge, Kirk	1
Saenz, David Arron	1
Salies, Tania Gastao	1
More ▼