ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	2

Descriptor

Interrater Reliability	4
Performance Based Assessment	4
Reliability	4
Error of Measurement	3
Generalizability Theory	2
Language Tests	2
Scores	2
Scoring	2
Accuracy	1
Certification	1
Chinese	1
Data Analysis	1
Decision Making	1
Elementary School Students	1
Elementary Secondary Education	1
English	1
Essay Tests	1
Evaluation Methods	1
Evaluators	1
Foreign Countries	1
Grade 8	1
Graduate Students	1
Hands on Science	1
Handwriting	1
High School Students	1
More ▼

Source

Applied Measurement in…	1
Educational Assessment	1
Language Assessment Quarterly	1
Language Testing	1

Author

Almond, Patricia	1
Bell, Robert M.	1
Comfort, Kathy	1
Han, Chao	1
Hollenbeck, Keith	1
Klein, Stephen P.	1
Lin, Chih-Kai	1
McCaffrey, Daniel	1
Ormseth, Tor	1
Othman, Abdul R.	1
Shavelson, Richard J.	1
Stecher, Brian M.	1
Tindal, Gerald	1
More ▼

Publication Type

Journal Articles	4
Reports - Research	4
Tests/Questionnaires	1

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Location

China (Beijing)

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 4 results Save | Export

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach

Peer reviewed

Direct link

Han, Chao – Language Assessment Quarterly, 2016

As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…

Descriptors: Foreign Countries, Scores, English, Chinese

Analytic versus Holistic Scoring of Science Performance Tasks.

Peer reviewed

Klein, Stephen P.; Stecher, Brian M.; Shavelson, Richard J.; McCaffrey, Daniel; Ormseth, Tor; Bell, Robert M.; Comfort, Kathy; Othman, Abdul R. – Applied Measurement in Education, 1998

Two studies involving 368 elementary and high school students and 29 readers were conducted to investigate reader consistency, score reliability, and reader time requirements of three hands-on science performance tasks. Holistic scores were as reliable as analytic scores, and there was a high correlation between them after they were disattenuated…

Descriptors: Elementary School Students, Elementary Secondary Education, Hands on Science, High School Students

Reliability and Decision Consistency: An Analysis of Writing Mode at Two Times on a Statewide Test.

Peer reviewed

Hollenbeck, Keith; Tindal, Gerald; Almond, Patricia – Educational Assessment, 1999

Studied the amount of measurement error in a state's performance-based writing task as it relates to high-stakes decision reproducibility. Using 175 eighth-grade writing samples, the study finds moderate correlations between the two raters' scores, with significant differences for the rates for the handwritten, but not the typed, essays.(SLD)

Descriptors: Decision Making, Error of Measurement, Essay Tests, Grade 8