NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 7 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca – Practical Assessment, Research & Evaluation, 2018
Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…
Descriptors: Interrater Reliability, Measurement, Observation, Educational Research
Peer reviewed Peer reviewed
Direct linkDirect link
Tavares, Walter; Brydges, Ryan; Myre, Paul; Prpic, Jason; Turner, Linda; Yelle, Richard; Huiskamp, Maud – Advances in Health Sciences Education, 2018
Assessment of clinical competence is complex and inference based. Trustworthy and defensible assessment processes must have favourable evidence of validity, particularly where decisions are considered high stakes. We aimed to organize, collect and interpret validity evidence for a high stakes simulation based assessment strategy for certifying…
Descriptors: Competence, Simulation, Allied Health Personnel, Certification
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Kaiser, Paul D.; Brull, Harry – 1994
The design, administration, scoring, and results of the 1993 New York State Correctional Captain Examination are described. The examination was administered to 405 candidates. As in previous Sergeant and Lieutenant examinations, candidates also completed latent image written simulation problems and open/closed book multiple choice test components.…
Descriptors: Competitive Selection, Correctional Rehabilitation, Decision Making, Educational Innovation
Joines, Richard C. – 1991
The development and validation of the General Management In-Basket (GMIB) is described. The GMIB is a theory-based generic in-basket simulation, designed to assess supervisory and management skills independent of any job classification. Three of the 15 in-basket items in the GMIB are critical and are scored on a 0-5 scale. The remaining 12 items…
Descriptors: Administrator Evaluation, Concurrent Validity, Factor Analysis, Interrater Reliability
Conley, Patrick; Jegerski, Jane – 1991
Construction of a work sample test, the Investigator Planning Exercise (IPE), for the job of detective in the Chicago (Illinois) Police Department is described. Simulated crime scenarios, a mock crime scene, and five checklists of necessary skills (i.e., ability to summarize and communicate facts, identify inconsistencies, and determine the next…
Descriptors: Check Lists, Deduction, Evaluators, Interrater Reliability