ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	10
Since 2007 (last 20 years)	15

Descriptor

Accuracy	15
Interrater Reliability	15
Test Reliability	15
Correlation	5
Foreign Countries	4
Student Evaluation	4
Error of Measurement	3
Evaluation Methods	3
Psychometrics	3
Scores	3
Scoring	3
Statistical Analysis	3
Test Validity	3
Certification	2
Comparative Analysis	2
Computation	2
Computer Assisted Testing	2
Elementary School Students	2
English (Second Language)	2
Essays	2
Generalizability Theory	2
High Stakes Tests	2
Item Response Theory	2
Language Tests	2
Observation	2
More ▼

Source

ETS Research Report Series	2
ProQuest LLC	2
Athletic Training Education…	1
Educational Leadership	1
Educational Testing Service	1
IEEE Transactions on Learning…	1
Journal of Continuing…	1
Journal of Curriculum and…	1
Journal of Speech, Language,…	1
Language Testing	1
Measurement and Evaluation in…	1
Measurement in Physical…	1
Reading Psychology	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	8
Reports - Descriptive	4
Tests/Questionnaires	3
Dissertations/Theses -…	2
Reports - Evaluative	1

Education Level

Higher Education	3
Postsecondary Education	3
Elementary Education	2
Adult Education	1
Early Childhood Education	1
Grade 1	1
Primary Education	1
Secondary Education	1

Audience

Location

California (Los Angeles)	1
France	1
Jordan	1
Netherlands	1
Netherlands (Amsterdam)	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
National Assessment of…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

When Do Pragmatic Abilities Peak? Assessment of Pragmatic Abilities and Cognitive Substrates--French Version Psychometric Properties across the Lifespan

Peer reviewed

Direct link

Nicolas Petit; Flavia Mengarelli; Marie-Maude Geoffray Cassar; Giorgio Arcara; Valentina Bambini – Journal of Speech, Language, and Hearing Research, 2025

Purpose: This study aims (a) to assess the psychometric properties of a French adaptation of the Assessment of Pragmatic Abilities and Cognitive Substrates (APACS-Fr), a comprehensive test of pragmatic abilities for French-speaking adolescents and adults, and (b) to use it to study lifespan variations in pragmatic abilities, to determine when…

Descriptors: Pragmatics, Cognitive Ability, Language Skills, Cognitive Measurement

Certified to Evaluate: Exploring Administrator Accuracy and Beliefs in Teacher Observation. Research Report. ETS RR-21-05

Peer reviewed
PDF on ERIC

Download full text

Jones, Nathan; Bell, Courtney; Qi, Yi; Lewis, Jennifer; Kirui, David; Stickler, Leslie; Redash, Amanda – ETS Research Report Series, 2021

The observation systems being used in all 50 states require administrators to learn to accurately and reliably score their teachers' instruction using standardized observation systems. Although the literature on observation systems is growing, relatively few studies have examined the outcomes of trainings focused on developing administrators'…

Descriptors: Observation, Standardized Tests, Teacher Evaluation, Test Reliability

The Generalizability of Running Record Accuracy and Self-Correction Scores

Peer reviewed

Direct link

D'Agostino, Jerome V.; Rodgers, Emily; Winkler, Christa; Johnson, Tracy; Berenbon, Rebecca – Reading Psychology, 2021

Running Records provide a standardized method for recording and assessing students' oral reading behaviors and are excellent formative assessment tools to guide instructional decision-making. This study expands on prior Running Record reliability work by evaluating the extent to which external raters and teachers consistently assessed students'…

Descriptors: Accuracy, Oral Reading, Generalizability Theory, Error Correction

Exploring Rating Quality in the Context of High-Stakes Rater-Mediated Educational Assessments

Direct link

Wenjing Guo – ProQuest LLC, 2021

Constructed response (CR) items are widely used in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district and state-level assessments in the United States. One unique feature of CR items is that they depend on human raters to assess the quality of examinees' work. The judgment of human…

Descriptors: National Competency Tests, Responses, Interrater Reliability, Error of Measurement

Examining the Calibration Process for Raters of the "GRE"® General Test. ETS GRE® Board Research Report. GRE®-19-01. Research Report Series. ETS RR-19-09

Peer reviewed
PDF on ERIC

Download full text

Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019

One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…

Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability

Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

Peer reviewed

Direct link

van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018

In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…

Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills

Processes and Procedures for Estimating Score Reliability and Precision

Peer reviewed

Direct link

Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017

Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…

Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests

Measuring L2 Speakers' Interactional Ability Using Interactive Speech Tasks

Peer reviewed

Direct link

van Batenburg, Eline S. L.; Oostdam, Ron J.; van Gelderen, Amos J. S.; de Jong, Nivja H. – Language Testing, 2018

This article explores ways to assess interactional performance, and reports on the use of a test format that standardizes the interlocutor's linguistic and interactional contributions to the exchange. It describes the construction and administration of six scripted speech tasks (instruction, advice, and sales tasks) with pre-vocational learners (n…

Descriptors: Second Language Learning, Speech Tests, Interaction, Test Reliability

A Comparison between Students' Self-Assessment and Teachers' Assessment

Peer reviewed
PDF on ERIC

Download full text

Thawabieh, Ahmad M. – Journal of Curriculum and Teaching, 2017

This study aimed to compare between the students' self-assessment and teachers' assessment. The study sample consisted of 71 students at Tafila Technical University studying Introduction to Psychology course. The researcher used 2 students' self-assessment tools and 2 tests. The results indicated that students can assess themselves accurately if…

Descriptors: Comparative Analysis, Self Evaluation (Individuals), Student Evaluation, Psychology

Grading: Why You Should Trust Your Judgment

Direct link

Guskey, Thomas R.; Jung, Lee Ann – Educational Leadership, 2016

Many educators consider grades calculated from statistical algorithms more accurate, objective, and reliable than grades they calculate themselves. But in this research, the authors first asked teachers to use their professional judgment to choose a summary grade for hypothetical students. When the researchers compared the teachers' grade with the…

Descriptors: Grading, Computer Assisted Testing, Interrater Reliability, Grades (Scholastic)

Item Response Theory for Peer Assessment

Peer reviewed

Direct link

Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016

As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…

Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation

Measuring Multidimensional Science Learning: Item Design, Scoring, and Psychometric Considerations

Direct link

Castle, Courtney – ProQuest LLC, 2018

The Next Generation Science Standards propose a multidimensional model of science learning, comprised of Core Disciplinary Ideas, Science and Engineering Practices, and Crosscutting Concepts (NGSS Lead States, 2013). Accordingly, there is a need for student assessment aligned with the new standards. Creating assessments that validly and reliably…

Descriptors: Science Education, Student Evaluation, Science Tests, Test Construction

Peer Assessment of Clinical Skills and Professional Behaviors among Undergraduate Athletic Training Students

Peer reviewed

Direct link

Engelmann, Jeanine E. – Athletic Training Education Journal, 2016

Context: Peer assessment is widely used in medical education as a formative evaluation and preparatory tool for students. Athletic training students learn similar knowledge, skills, and affective traits as medical students. Peer assessment has been widely studied with beneficial results in medical education, yet athletic training education has…

Descriptors: Peer Evaluation, Undergraduate Students, College Athletics, Professional Education

Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

Download full text

Haberman, Shelby J. – Educational Testing Service, 2011

Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

Descriptors: Writing Tests, Scoring, Essays, Language Tests

Rater Training to Support High-Stakes Simulation-Based Assessments

Peer reviewed

Direct link

Feldman, Moshe; Lazzara, Elizabeth H.; Vanderbilt, Allison A.; DiazGranados, Deborah – Journal of Continuing Education in the Health Professions, 2012

Competency-based assessment and an emphasis on obtaining higher-level outcomes that reflect physicians' ability to demonstrate their skills has created a need for more advanced assessment practices. Simulation-based assessments provide medical education planners with tools to better evaluate the 6 Accreditation Council for Graduate Medical…

Descriptors: Performance Based Assessment, Physicians, Accuracy, High Stakes Tests

Bardhoshi, Gerta	1
Bell, Courtney	1
Berenbon, Rebecca	1
Castle, Courtney	1
Cline, Frederick	1
D'Agostino, Jerome V.	1
DiazGranados, Deborah	1
Engelmann, Jeanine E.	1
Erford, Bradley T.	1
Feldman, Moshe	1
Flavia Mengarelli	1
Giorgio Arcara	1
Glazer, Nancy	1
Guskey, Thomas R.	1
Haberman, Shelby J.	1
Johnson, Tracy	1
Jones, Nathan	1
Jung, Lee Ann	1
Kirui, David	1
Lazzara, Elizabeth H.	1
Lewis, Jennifer	1
Marie-Maude Geoffray Cassar	1
Nicolas Petit	1
Oostdam, Ron J.	1
Qi, Yi	1
More ▼