NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20260
Since 20250
Since 2022 (last 5 years)0
Since 2017 (last 10 years)0
Since 2007 (last 20 years)3
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Showing all 15 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Lu, Chia-Chen; Luh, Ding-Bang – Creativity Research Journal, 2012
Although previous studies have attempted to use different experiences of raters to rate product creativity by adopting the Consensus Assessment Method (CAT) approach, the validity of replacing CAT with another measurement tool has not been adequately tested. This study aimed to compare raters with different levels of experience (expert ves.…
Descriptors: Creativity, Interrater Reliability, Construct Validity, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Baker, Beverly A. – Assessing Writing, 2010
In high-stakes writing assessments, rater training in the use of a rating scale does not eliminate variability in grade attribution. This realisation has been accompanied by research that explores possible sources of rater variability, such as rater background or rating scale type. However, there has been little consideration thus far of…
Descriptors: Foreign Countries, Writing Evaluation, Writing Tests, Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Mogey, Nora; Paterson, Jessie; Burk, John; Purcell, Michael – ALT-J: Research in Learning Technology, 2010
Students at the University of Edinburgh do almost all their work on computers, but at the end of the semester they are examined by handwritten essays. Intuitively it would be appealing to allow students the choice of handwriting or typing, but this raises a concern that perhaps this might not be "fair"--that the choice a student makes,…
Descriptors: Handwriting, Essay Tests, Interrater Reliability, Grading
Michaelides, Michalis P.; Haertel, Edward H. – Center for Research on Evaluation Standards and Student Testing CRESST, 2004
There is variability in the estimation of an equating transformation because common-item parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a…
Descriptors: Test Items, Testing, Error Patterns, Interrater Reliability
O'Neill, Thomas R.; Lunz, Mary E. – 1997
This paper illustrates a method to study rater severity across exam administrations. A multi-facet Rasch model defined the ratings as being dominated by four facets: examinee ability, rater severity, project difficulty, and task difficulty. Ten years of data from administrations of a histotechnology performance assessment were pooled and analyzed…
Descriptors: Ability, Comparative Analysis, Equated Scores, Interrater Reliability
Peer reviewed Peer reviewed
Tarico, Valerie S.; And Others – Journal of Counseling Psychology, 1986
Compared three methods of rating thoughts: self-rating by subjects, rating by experts with thoughts presented randomly, and rating by experts with thoughts presented in context among 107 students who listed their thoughts prior to giving a speech. Results indicated all three methods were equal in predictions of speech anxiety and performance.…
Descriptors: Anxiety, Cognitive Measurement, Cognitive Processes, Comparative Analysis
Kenyon, Dorry; Stansfield, Charles W. – 1993
This paper examines whether individuals who train themselves to score a performance assessment will rate acceptably when compared to known standards. Research on the efficacy of rater self-training materials developed by the Center for Applied Linguistics for the Texas Oral Proficiency Test (TOPT) is examined. Rater self-materials are described…
Descriptors: Bilingual Education, Comparative Analysis, Evaluators, Individual Characteristics
Takala, Sauli – 1998
This paper discusses recent developments in language testing. It begins with a review of the traditional criteria that are applied to all measurement and outlines recent emphases that derive from the expanding range of stakeholders. Drawing on Alderson's seminal work, criteria are presented for evaluating communicative language tests. Developments…
Descriptors: Alternative Assessment, Communicative Competence (Languages), Comparative Analysis, Evaluation Criteria
McNamara, T. F.; Adams, R. J. – 1991
A preliminary study is reported of the use of new multifaceted Rasch measurement mechanisms for investigating rater characteristics in language testing. Ratings from four judges of scripts from 50 candidates taking the International English Language Testing System test, a test of English for Academic Purposes, are analyzed. The analysis…
Descriptors: Comparative Analysis, English (Second Language), Foreign Countries, Interrater Reliability
PDF pending restoration PDF pending restoration
Hori, Utako; Ito, Tokumi; Kitazawa, Mieko; Masuda, Masako; Ogiwara, Chikako; Saito, Mariko; Yoneda, Yukiyo – 1996
A group of seven Japanese-language Oral Proficiency Interview (OPI) testers licensed by the American Council on the Teaching of foreign Languages (ACTFL) conducted research related to ACTFL-OPI criteria. They first examined 24 audiotaped interview tests to see what kind of consistency there would be when individual testers applied general criteria…
Descriptors: Audiotape Recordings, Comparative Analysis, Foreign Countries, Grammar
Peer reviewed Peer reviewed
Wigglesworth, Gillian – Language Testing, 1997
In this study, planning time was manipulated as a variable in a trial administration of a semi-direct oral interaction test. Discourse analytic techniques were used to determine the nature and/or significance of difference in the elicited discourse across two conditions in terms of complexity and accuracy. Findings suggest that planning time may…
Descriptors: Cognitive Development, Communicative Competence (Languages), Comparative Analysis, Discourse Analysis
Nakamura, Yuji – Journal of Communication Studies, 1997
This study investigated the effects of three aspects of language testing (test task, familiarity with an interviewer, and test method) on both tester and tested. Data were drawn from several previous studies by the researcher. Concerning test task, data were analyzed for the type of topic students wanted most to talk about or preferred not to talk…
Descriptors: Behavior Patterns, Comparative Analysis, English (Second Language), Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, H. K. – Assessing Writing, 2004
This study aimed to comprehensively investigate the impact of a word-processor on an ESL writing assessment, covering comparison of inter-rater reliability, the quality of written products, the writing process across different testing occasions using different writing media, and students' perception of a computer-delivered test. Writing samples of…
Descriptors: Writing Evaluation, Student Attitudes, Writing Tests, Testing
Russikoff, Karen A. – 1994
Problems inherent in the holistic scoring of essay examinations written by limited-English-speakers are examined, particularly in the context of one California state college in which English writing skills, holistically assessed, are required for graduation. These problems include lack of interrater reliability, raters' perceptions of their role,…
Descriptors: Case Studies, College Faculty, College Instruction, Comparative Analysis
Chalhoub-Deville, Micheline – 1993
This study investigated whether different groups of native speakers assess second language learners' language skills differently for three elicitation techniques. Subjects were six learners of college-level Arabic as a second language, tape-recorded performing three tasks: participating in a modified oral proficiency interview, narrating a picture…
Descriptors: Arabic, College Students, Comparative Analysis, Higher Education