NotesFAQContact Us
Collection
Advanced
Search Tips
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 18 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Brent J. Goertzen; Kaley Klaus – Research & Practice in Assessment, 2023
When evaluating student learning, educators often employ scoring rubrics, for which quality can be determined through evaluating validity and reliability. This article discusses the norming process utilized in a graduate organizational leadership program for a capstone scoring rubric. Concepts of validity and reliability are discussed, as is the…
Descriptors: Graduate Students, Graduate Study, Graduate School Faculty, Scoring Rubrics
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019
One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…
Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Research & Practice in Assessment, 2022
Meta-assessment is a useful strategy to document assessment practices and guide efforts to improve the culture of assessment at an institution. In this study, a meta-assessment of undergraduate and graduate academic program assessment reports evaluated the maturity of assessment work. Assessment reports submitted in the first year (75…
Descriptors: Program Evaluation, Educational Assessment, Meta Analysis, Undergraduate Study
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018
This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…
Descriptors: Intervals, Scoring, Accuracy, Essay Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
McCaffrey, Daniel F.; Oliveri, Maria Elena; Holtzman, Steven – ETS Research Report Series, 2018
Scores from noncognitive measures are increasingly valued for their utility in helping to inform postsecondary admissions decisions. However, their use has presented challenges because of faking, response biases, or subjectivity, which standardized third-party evaluations (TPEs) can help minimize. Analysts and researchers using TPEs, however, need…
Descriptors: Generalizability Theory, Scores, College Admission, Admission Criteria
Peer reviewed Peer reviewed
Direct linkDirect link
Burch, Vanessa C. – Journal of Applied Testing Technology, 2019
Health professions education has undergone radical changes over the past 100 years. This has necessitated a shift away from education programmes largely focused on testing knowledge and skills using predominantly written examinations. There has been a shift towards programmes which are intentionally designed with the end product in mind, a…
Descriptors: Workplace Learning, Clinical Experience, Situated Learning, Health Personnel
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mo – ETS Research Report Series, 2013
Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…
Descriptors: Automation, Scoring, Essay Tests, Sampling
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bejar, Isaac I.; VanWinkle, Waverely; Madnani, Nitin; Lewis, William; Steier, Michael – ETS Research Report Series, 2013
The paper applies a natural language computational tool to study a potential construct-irrelevant response strategy, namely the use of "shell language." Although the study is motivated by the impending increase in the volume of scoring of students responses from assessments to be developed in response to the Race to the Top initiative,…
Descriptors: Responses, Language Usage, Natural Language Processing, Computational Linguistics
Peer reviewed Peer reviewed
Direct linkDirect link
Henderson, Sheila J.; Horton, Ruth A.; Saito, Paul K.; Shorter-Gooden, Kumea – Multicultural Learning and Teaching, 2016
The purpose of this research was to develop a new tool for assessing multicultural and international competency in faculty teaching through vignette scenarios of university classroom critical incidents--across disciplines of clinical and forensics psychology, business, and education. Construct and content validity of the initial draft vignettes…
Descriptors: Scoring Rubrics, Critical Incidents Method, Construct Validity, Content Validity
Hahna, Nicole D. – ProQuest LLC, 2011
Four music therapy educators participated in semi-structured, in-depth interviews as part of a qualitative study. The purpose of this study was to explore the phenomena of feminist pedagogy as experienced by music therapy educators using phenomenological inquiry. The study examined the following research questions: (a) do music therapy educators…
Descriptors: Music Therapy, Teaching Methods, Feminism, Interrater Reliability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012
Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…
Descriptors: Scoring, Test Scoring Machines, Automation, Models
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Gorsky, Paul; Caspi, Avner; Blau, Ina; Vine, Yodfat; Billet, Amit – International Review of Research in Open and Distance Learning, 2012
The goal of this study is to further corroborate a hypothesized population parameter for the frequencies of social presence versus the sum of teaching presence and cognitive presence as defined by the community of inquiry model in higher education asynchronous course forums. This parameter has been found across five variables: academic institution…
Descriptors: Foreign Countries, Open Universities, Inquiry, Communities of Practice
Peer reviewed Peer reviewed
Sadler, Philip M.; Hammerman, James K. – College and University, 1999
A quantitative study modeled the inherently subjective admissions process for 592 graduate school candidates and 72 raters. Logistic regression models were well-fitting and parsimonious, allowing analysis of each stage of the process. Extended committee discussion/deliberation phases were of limited productivity when inter-rater agreement was…
Descriptors: Admission Criteria, Bias, College Admission, Committees
Schwarz, Julie A.; Collins, Michelle L. – 1995
Behaviorally Anchored Rating Scales (BARS) were developed to score responses from a previously designed police written communication test that lacked reliability. Rating scales for each of the 9 dimensions of the test consisted of the scale definition and a 5-point continuum, with the scores of 5, 3, and 1 defined by specified behavioral…
Descriptors: Graduate Students, Graduate Study, Higher Education, Interrater Reliability
Peer reviewed Peer reviewed
Feletti, Grahame; Ryan, Greg – Assessment & Evaluation in Higher Education, 1994
The Triple Jump, a procedure for assessing students' problem-based learning, is applied to assessment of inquiry-based learning in a graduate course. Results suggest the need for more research into interrater reliability and other characteristics of the exercise. Some simple strategies for making the instrument cost effective are offered. (MSE)
Descriptors: Evaluation Methods, Graduate Study, Higher Education, Independent Study
Previous Page | Next Page »
Pages: 1  |  2