ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	7

Descriptor

Evaluation Methods	17
Interrater Reliability	17
Test Items	17
Scoring	8
Foreign Countries	5
Item Response Theory	5
Psychometrics	4
Scores	4
Test Reliability	4
Measures (Individuals)	3
Reading Tests	3
Standard Setting (Scoring)	3
Standards	3
Student Evaluation	3
Test Construction	3
Test Validity	3
Check Lists	2
Cognitive Processes	2
Comparative Analysis	2
Content Validity	2
Criterion Referenced Tests	2
Culture Fair Tests	2
Data Analysis	2
Difficulty Level	2
Disability Identification	2
More ▼

Source

Online Submission	2
Assessment	1
International Electronic…	1
International Journal of…	1
International Journal of…	1
Journal on Educational…	1
Measurement in Physical…	1
New Directions for Teaching…	1
Topics in Early Childhood…	1

Publication Type

Reports - Research	10
Speeches/Meeting Papers	9
Journal Articles	8
Reports - Evaluative	4
Guides - General	1
Information Analyses	1
Reports - Descriptive	1

Education Level

Elementary Education	3
Grade 4	3
Grade 6	2
Grade 8	2
Elementary Secondary Education	1
Higher Education	1
Intermediate Grades	1

Audience

Researchers

Location

Canada	1
India	1
Indonesia	1
Netherlands	1
Oregon	1
United States	1
Washington	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

Investigation of Rater Tendencies and Reliability in Different Assessment Methods with Many Facet Rasch Model

Peer reviewed
PDF on ERIC

Download full text

Koçak, Duygu – International Electronic Journal of Elementary Education, 2020

One of the most commonly used methods for measuring higher-order thinking skills such as problem-solving or written expression is open-ended items. Three main approaches are used to evaluate responses to open-ended items: general evaluation, rating scales, and rubrics. In order to measure and improve problem-solving skills of students, firstly, an…

Descriptors: Interrater Reliability, Item Response Theory, Test Items, Rating Scales

Developing an Instrument to Detect Science Misconception of an Elementary School Teacher

Peer reviewed
PDF on ERIC

Download full text

Desstya, Anatri; Prasetyo, Zuhdan Kun; Suyanta; Susila, Ihwan; Irwanto – International Journal of Instruction, 2019

This study aims to report the development an instrument that is standardized (reviewed by validity, reliability, and difficulty index) to detect science misconception in an elementary school teacher. This study used a 4-D model; defining, designing, developing, and disseminating. First, it was prepared with 47 opened-ended questions, and then it…

Descriptors: Elementary School Teachers, Misconceptions, Evaluation Methods, Teacher Evaluation

ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations

Peer reviewed

Direct link

International Journal of Testing, 2019

These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…

Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage

Scoring Subjectivity and Item Performance on Measures Used to Assess Violence Risk: The PCL-R and HCR-20 as Exemplars

Peer reviewed

Direct link

Rufino, Katrina A.; Boccaccini, Marcus T.; Guy, Laura S. – Assessment, 2011

Although reliability is essential to validity, most research on violence risk assessment tools has paid little attention to strategies for improving rater agreement. The authors evaluated the degree to which perceived subjectivity in scoring guidelines for items from two measures--the Psychopathy Checklist-Revised (PCL-R) and the Historical,…

Descriptors: Risk Management, Predictive Validity, Interrater Reliability, Scoring

Construction and Standardization of Verbal Learning Disabilities Checklist for School Children

Peer reviewed
PDF on ERIC

Download full text

Sood, Vishal – Journal on Educational Psychology, 2013

For identifying children with four major kinds of verbal learning disabilities viz. reading disability, speech and language comprehension disability, writing disability and mathematics disability, the present task was undertaken to construct and standardize verbal learning disabilities checklist. This checklist was developed by keeping in view the…

Descriptors: Verbal Learning, Learning Disabilities, Children, Disability Identification

A Psychometric Study of the Infant and Toddler Intervals of the Social Emotional Assessment Measure

Peer reviewed

Direct link

Squires, Jane K.; Waddell, Misti L.; Clifford, Jantina R.; Funk, Kristin; Hoselton, Robert M.; Chen, Ching-I – Topics in Early Childhood Special Education, 2013

Psychometric and utility studies on Social Emotional Assessment Measure (SEAM), an innovative tool for assessing and monitoring social-emotional and behavioral development in infants and toddlers with disabilities, were conducted. The Infant and Toddler SEAM intervals were the study focus, using mixed methods, including item response theory…

Descriptors: Psychometrics, Evaluation Methods, Social Development, Emotional Development

PE Metrics: Background, Testing Theory, and Methods

Peer reviewed

Direct link

Zhu, Weimo; Rink, Judy; Placek, Judith H.; Graber, Kim C.; Fox, Connie; Fisette, Jennifer L.; Dyson, Ben; Park, Youngsik; Avery, Marybell; Franck, Marian; Raynes, De – Measurement in Physical Education and Exercise Science, 2011

New testing theories, concepts, and psychometric methods (e.g., item response theory, test equating, and item bank) developed during the past several decades have many advantages over previous theories and methods. In spite of their introduction to the field, they have not been fully accepted by physical educators. Further, the manner in which…

Descriptors: Physical Education, Quality Control, Psychometrics, Item Response Theory

Interrater Agreement: Same Data, Different Definitions, Different Outcomes.

Download full text

Micceri, Theodore; And Others – 1987

Several issues relating to agreement estimates for different types of data from performance evaluations are considered. New indices of agreement are presented for ordinal level items and for summative scores produced by nominal or ordinal level items. Two sets of empirical data illustrate the performance of the two formulas derived to estimate…

Descriptors: Correlation, Data Analysis, Educational Research, Estimation (Mathematics)

Adjusting for Year to Year Rater Variation in IRT Linking--An Empirical Evaluation

Download full text

Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg – Online Submission, 2005

The main purpose of this study was to illustrate a polytomous IRT-based linking procedure that adjusts for rater variations. Test scores from two administrations of a statewide reading assessment were used. An anchor set of Year 1 students' constructed responses were rescored by Year 2 raters. To adjust for year-to-year rater variation in IRT…

Descriptors: Test Items, Measures (Individuals), Grade 8, Item Response Theory

Judgmental Standard Setting Using a Cognitive Components Model.

Download full text

McGinty, Dixie; Neel, John H. – 1996

A new standard setting approach is introduced, called the cognitive components approach. Like the Angoff method, the cognitive components method generates minimum pass levels (MPLs) for each item. In both approaches, the item MPLs are summed for each judge, then averaged across judges to yield the standard. In the cognitive components approach,…

Descriptors: Cognitive Processes, Criterion Referenced Tests, Evaluation Methods, Grade 3

Language Testing: Recent Developments and Persistent Dilemmas.

Download full text

Takala, Sauli – 1998

This paper discusses recent developments in language testing. It begins with a review of the traditional criteria that are applied to all measurement and outlines recent emphases that derive from the expanding range of stakeholders. Drawing on Alderson's seminal work, criteria are presented for evaluating communicative language tests. Developments…

Descriptors: Alternative Assessment, Communicative Competence (Languages), Comparative Analysis, Evaluation Criteria

Interjudge Variability and Intrajudge Consistency Using the Cognitive Components Model for Standard Setting.

Download full text

McGinty, Dixie; Neel, John H.; Hsu, Yu-Sheng – 1996

The cognitive components standard setting method, recently introduced by D. McGinty and J. Neel (1996), asks judges to specify minimum levels of performance not for the test items, but for smaller portions of items, the component skills and concepts required to answer each item correctly. Items are decomposed into these components before judges…

Descriptors: Cognitive Processes, Criterion Referenced Tests, Elementary Education, Evaluation Methods

The Effect of Year-to-Year Rater Variation on IRT Linking

Download full text

Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg – Online Submission, 2005

Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology…

Descriptors: Test Items, Writing Tests, Reading Tests, Measures (Individuals)

Setting Standards on NAEP Performance Items.

Download full text

Reckase, Mark D.; And Others – 1995

The research reported in this paper was conducted to gain information to guide the selection of standard setting procedures for use with polytomous items to set achievement levels on the National Assessment of Educational Progress (NAEP) assessments in U.S. History and geography. Standard-setting procedures were evaluated to determine the relative…

Descriptors: Academic Achievement, Educational Assessment, Elementary Secondary Education, Evaluation Methods

Psychometric Properties of Student Ratings of Instruction in Online and On-Campus Courses

Peer reviewed

Direct link

McGhee, Debbie E.; Lowell, Nana – New Directions for Teaching and Learning, 2003

This study compares mean ratings, inter-rater reliabilities, and the factor structure of items for online and paper student-rating forms from the University of Washington's Instructional Assessment System. (Contains 3 figures and 2 tables.)

Descriptors: Psychometrics, Factor Structure, Student Evaluation of Teacher Performance, Test Items

Previous Page | Next Page »

Pages: 1 | 2

Friedman, Greg	2
McGinty, Dixie	2
Michaels, Hillary	2
Neel, John H.	2
Ochieng, Charles	2
Yen, Shu Jing	2
Avery, Marybell	1
Boccaccini, Marcus T.	1
Chen, Ching-I	1
Clifford, Jantina R.	1
Desstya, Anatri	1
Dyson, Ben	1
Fisette, Jennifer L.	1
Fox, Connie	1
Franck, Marian	1
Funk, Kristin	1
Graber, Kim C.	1
Guy, Laura S.	1
Hoselton, Robert M.	1
Hsu, Yu-Sheng	1
Irwanto	1
Koçak, Duygu	1
Kreeft, Henk	1
Lowell, Nana	1
More ▼