ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	5

Descriptor

Interrater Reliability	29
Test Use	29
Test Validity	14
Scoring	13
Test Reliability	11
Test Construction	8
Educational Assessment	7
Performance Based Assessment	5
Test Format	5
Elementary School Students	4
Evaluators	4
Generalizability Theory	4
Language Tests	4
Rating Scales	4
Scores	4
Test Items	4
Testing	4
Difficulty Level	3
Evaluation Methods	3
High Stakes Tests	3
Higher Education	3
Language Arts	3
Meta Analysis	3
Psychometrics	3
Quality Control	3
More ▼

Source

Psychological Assessment	3
Educational and Psychological…	2
Academic Medicine	1
Advances in Health Sciences…	1
Applied Measurement in…	1
Assessment	1
Education and Training in…	1
Educational Researcher	1
International Journal of…	1
Journal of Consulting and…	1
Journal of Early Intervention	1
New York State Education…	1
Studies in Second Language…	1
More ▼

Publication Type

Journal Articles	15
Reports - Research	11
Reports - Evaluative	10
Speeches/Meeting Papers	10
Information Analyses	6
Guides - Non-Classroom	2
Opinion Papers	2
Guides - General	1
Numerical/Quantitative Data	1
Reports - Descriptive	1

Education Level

Early Childhood Education	2
Elementary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Administrators	1
Practitioners	1
Teachers	1

Location

Australia	1
Kansas	1
Kentucky	1
Nevada	1
New York	1
Ohio	1
Oregon	1
Pennsylvania	1
Tennessee	1
Texas	1
Virginia	1
More ▼

Laws, Policies, & Programs

Individuals with Disabilities…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

Rorschach Test	3
Early Childhood Environment…	1
Work Keys (ACT)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 29 results Save | Export

Measurement Properties of a Standardized Elicited Imitation Test: An Integrative Data Analysis

Peer reviewed

Direct link

Isbell, Daniel R.; Son, Young-A – Studies in Second Language Acquisition, 2022

Elicited Imitation Tests (EITs) are commonly used in second language acquisition (SLA)/bilingualism research contexts to assess the general oral proficiency of study participants. While previous studies have provided valuable EIT construct-related validity evidence, some key gaps remain. This study uses an integrative data analysis to further…

Descriptors: Bilingualism, Imitation, Language Tests, Second Language Learning

Psychometric Properties of the Assessment, Evaluation, and Programming System for Infants and Children--Third Edition (AEPS-3)

Peer reviewed

Direct link

Grisham, Jennifer; Waddell, Misti; Crawford, Rebecca; Toland, Michael – Journal of Early Intervention, 2021

The purpose of this article is to provide evidence of the technical adequacy of the Assessment, Evaluation, and Programming System--Third Edition (AEPS-3). The AEPS has long been identified as one of the most psychometrically sound early childhood curriculum-based assessments. In this article, results of three studies of technical adequacy are…

Descriptors: Infants, Young Children, Curriculum Based Assessment, Psychometrics

Constructing a Validity Argument for the Objective Structured Assessment of Technical Skills (OSATS): A Systematic Review of Validity Evidence

Peer reviewed

Direct link

Hatala, Rose; Cook, David A.; Brydges, Ryan; Hawkins, Richard – Advances in Health Sciences Education, 2015

In order to construct and evaluate the validity argument for the Objective Structured Assessment of Technical Skills (OSATS), based on Kane's framework, we conducted a systematic review. We searched MEDLINE, EMBASE, CINAHL, PsycINFO, ERIC, Web of Science, Scopus, and selected reference lists through February 2013. Working in duplicate, we selected…

Descriptors: Measures (Individuals), Test Validity, Surgery, Skills

ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations

Peer reviewed

Direct link

International Journal of Testing, 2019

These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…

Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage

New York State Alternate Assessment Technical Report, 2013-14

Download full text

New York State Education Department, 2014

This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. The purpose of this report is to document the technical aspects of the 2013-14 NYSAA.…

Descriptors: Alternative Assessment, Educational Assessment, State Departments of Education, Student Evaluation

Assessing Reliability: Critical Corrections for a Critical Examination of the Rorschach Comprehensive System.

Peer reviewed

Meyer, Gregory J. – Psychological Assessment, 1997

In reply to criticism of the Rorschach Comprehensive System (CS) by J. Wood, M. Nezworski, and W. Stejskal (1996), this article presents a meta-analysis of published data indicating that the CS has excellent chance-corrected interrater reliability. It is noted that the erroneous assumptions of Wood et al. make their assertions about validity…

Descriptors: Interrater Reliability, Meta Analysis, Test Use, Test Validity

The Reliability of the Comprehensive System for the Rorschach: A Comment on Meyer (1997).

Peer reviewed

Wood, James M.; Nezworski, M. Teresa; Stejskal, William J. – Psychological Assessment, 1997

G. Meyer (1997) attempts to refute the present authors' criticisms of the interrater reliability of the Rorschach Comprehensive System (CS) but misrepresents their position and offers a flawed meta-analysis in support of his own. Rorschach proponents need to undertake high-quality replicated studies of CS reliability and validity. (SLD)

Descriptors: Interrater Reliability, Meta Analysis, Test Use, Test Validity

Thinking Clearly about Reliability: More Critical Corrections Regarding the Rorschach Comprehensive System.

Peer reviewed

Meyer, Gregory J. – Psychological Assessment, 1997

Replies to Wood et al. and documents limitations of their conclusions about the Rorschach Comprehensive System (CS), supporting Meyer's own meta-analysis, which finds adequate interrater reliability for the CS. (SLD)

Descriptors: Interrater Reliability, Meta Analysis, Test Use, Test Validity

Interrater Reliability of the Ruff Figural Fluency Test.

Peer reviewed

Berning, Lisa C.; Weed, Nathan C.; Aloia, Mark S. – Assessment, 1998

To examine the interrater reliability of the Ruff Figural Fluency Test (RFFT) (R. Ruff, 1988), 124 college students completed the measure and scored RFFT test protocols. Results indicated substantial interscorer reliability on the RFFT, particularly for number of unique designs. Reliability was lower for scoring perseverative errors and error…

Descriptors: College Students, Higher Education, Interrater Reliability, Scoring

Client Verbal Response Category System: Preliminary Data.

Peer reviewed

Meier, Augustine; Boivin, Micheline – Journal of Consulting and Clinical Psychology, 1986

The Client Verbal Response Category System classifies client responses into Temporal, Directional and Experiential categories. The categories with their subcategories are defined, interjudge reliability data is presented, and the instrument's utility in psychotherapy process research is demonstrated. Initial results indicate that the instrument is…

Descriptors: Client Characteristics (Human Services), Interrater Reliability, Psychotherapy, Research Tools

Comments on the Measurement of Halo.

Peer reviewed

Fisicaro, Sebastiano A.; Vance, Robert J. – Educational and Psychological Measurement, 1994

This article presents arguments that the correlation measure "r" of halo is not conceptually more appropriate than the standard deviation (SD) measure. It also describes conditions under which halo effects occur and when the SD and r measures can be used. Neither measure is uniformly superior to the other. (SLD)

Descriptors: Correlation, Evaluation Methods, Interrater Reliability, Measurement Techniques

An Analysis of the Reliability and Stability of the Motivation Assessment Scale in Assessing the Challenging Behaviors of Persons with Developmental Disabilities.

Peer reviewed

Conroy, Maureen A.; And Others – Education and Training in Mental Retardation and Developmental Disabilities, 1996

This study assessed the intra-rater and inter-rater reliability of the Motivation Assessment Scale as used with 20 adults with mental retardation, expanding the results of previous research by evaluating across additional time and administrations. Results from 19 raters indicated variable moderate-to-low intra-rater and inter-rater reliability.…

Descriptors: Adults, Behavior Problems, Interrater Reliability, Measures (Individuals)

Clarifying the Blurred Image: Estimating the Inter-Rater Reliability of Performance Assessments.

Download full text

Moore, Alan D.; Young, Suzanne – 1997

As schools move toward performance assessment, there is increasing discussion of using these assessments for accountability purposes. When used for making decisions, performance assessments must meet high standards of validity and reliability. One major source of unreliability in performance assessments is interrater disagreement. In this paper,…

Descriptors: Accountability, Correlation, Elementary Secondary Education, Generalizability Theory

Statistical Test Specifications for Performance Assessments: Is This an Oxymoron?

Download full text

Reckase, Mark D. – 1997

This paper argues that special procedures for constructing assessment tools containing performance assessment tasks are unnecessary and that current test methodology can easily be generalized to complex performance assessment tasks without destroying the desirable characteristics of those tasks. Reasonable statistical requirements for sound…

Descriptors: Educational Assessment, Generalizability Theory, High Stakes Tests, Interrater Reliability

Capturing Teachers' Knowledge: Performance Assessment a) and Post-Structuralist Epistemology, b) From a Post-Structuralist Perspective, c) and Post-Structuralism, d) None of the Above.

Peer reviewed

Delandshere, Ginette; Petrosky, Anthony R. – Educational Researcher, 1994

Discusses the role and consistency of judges' interpretations of teacher performance as part of an evaluative scheme for complex performance, with reference to the ideological framework of professional standards. The tension between assessment decisions and the recognition that assessment involves interpretation is explored. (SLD)

Descriptors: Decision Making, Educational Assessment, Epistemology, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

Meyer, Gregory J.	2
Alderson, J. Charles	1
Aloia, Mark S.	1
Berning, Lisa C.	1
Boivin, Micheline	1
Brennan, Robert L.	1
Brydges, Ryan	1
Conroy, Maureen A.	1
Cook, David A.	1
Crawford, Rebecca	1
Crehan, Kevin D.	1
Delandshere, Ginette	1
Dunbar, Stephen B.	1
Ferroli, Lou	1
Fiene, Richard	1
Fisicaro, Sebastiano A.	1
Gearhart, Maryl	1
Grisham, Jennifer	1
Hatala, Rose	1
Hawkins, Richard	1
Herman, Joan L.	1
Howard, Edward H.	1
Ingram, D. E.	1
Isbell, Daniel R.	1
More ▼