Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 2 |
Descriptor
| Interrater Reliability | 4 |
| Performance Based Assessment | 4 |
| Reliability | 4 |
| Error of Measurement | 3 |
| Generalizability Theory | 2 |
| Language Tests | 2 |
| Scores | 2 |
| Scoring | 2 |
| Accuracy | 1 |
| Certification | 1 |
| Chinese | 1 |
| More ▼ | |
Source
| Applied Measurement in… | 1 |
| Educational Assessment | 1 |
| Language Assessment Quarterly | 1 |
| Language Testing | 1 |
Author
| Almond, Patricia | 1 |
| Bell, Robert M. | 1 |
| Comfort, Kathy | 1 |
| Han, Chao | 1 |
| Hollenbeck, Keith | 1 |
| Klein, Stephen P. | 1 |
| Lin, Chih-Kai | 1 |
| McCaffrey, Daniel | 1 |
| Ormseth, Tor | 1 |
| Othman, Abdul R. | 1 |
| Shavelson, Richard J. | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 4 |
| Reports - Research | 4 |
| Tests/Questionnaires | 1 |
Education Level
| Higher Education | 1 |
| Postsecondary Education | 1 |
Audience
Location
| China (Beijing) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Han, Chao – Language Assessment Quarterly, 2016
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
Descriptors: Foreign Countries, Scores, English, Chinese
Peer reviewedKlein, Stephen P.; Stecher, Brian M.; Shavelson, Richard J.; McCaffrey, Daniel; Ormseth, Tor; Bell, Robert M.; Comfort, Kathy; Othman, Abdul R. – Applied Measurement in Education, 1998
Two studies involving 368 elementary and high school students and 29 readers were conducted to investigate reader consistency, score reliability, and reader time requirements of three hands-on science performance tasks. Holistic scores were as reliable as analytic scores, and there was a high correlation between them after they were disattenuated…
Descriptors: Elementary School Students, Elementary Secondary Education, Hands on Science, High School Students
Peer reviewedHollenbeck, Keith; Tindal, Gerald; Almond, Patricia – Educational Assessment, 1999
Studied the amount of measurement error in a state's performance-based writing task as it relates to high-stakes decision reproducibility. Using 175 eighth-grade writing samples, the study finds moderate correlations between the two raters' scores, with significant differences for the rates for the handwritten, but not the typed, essays.(SLD)
Descriptors: Decision Making, Error of Measurement, Essay Tests, Grade 8

Direct link
