ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	6
Since 2007 (last 20 years)	13

Descriptor

Graduate Study	18
Interrater Reliability	18
Scoring	7
College Entrance Examinations	6
Graduate Students	4
Higher Education	4
Automation	3
Essay Tests	3
Essays	3
Evaluation Methods	3
Feedback (Response)	3
Foreign Countries	3
Scoring Rubrics	3
Statistical Analysis	3
Teaching Methods	3
Undergraduate Study	3
Writing Tests	3
Accuracy	2
Admission Criteria	2
Bias	2
Business Administration…	2
College Admission	2
College Faculty	2
Computer Assisted Testing	2
Correlation	2
More ▼

Source

ETS Research Report Series	6
Research & Practice in…	2
Assessment & Evaluation in…	1
College and University	1
International Review of…	1
Journal of Applied Testing…	1
Journal of Technology,…	1
Multicultural Learning and…	1
ProQuest LLC	1
Teaching in Higher Education	1

Publication Type

Journal Articles	15
Reports - Research	11
Reports - Evaluative	3
Speeches/Meeting Papers	2
Dissertations/Theses -…	1
Information Analyses	1
Reports - Descriptive	1
Reports - General	1
Tests/Questionnaires	1

Education Level

Higher Education	14
Postsecondary Education	13

Audience

Administrators	1
Practitioners	1

Location

Israel	1
Taiwan	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	5
Graduate Management Admission…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Is It Actually Reliable? Examining Statistical Methods for Inter-Rater Reliability of a Rubric in Graduate Education

Peer reviewed
PDF on ERIC

Download full text

Brent J. Goertzen; Kaley Klaus – Research & Practice in Assessment, 2023

When evaluating student learning, educators often employ scoring rubrics, for which quality can be determined through evaluating validity and reliability. This article discusses the norming process utilized in a graduate organizational leadership program for a capstone scoring rubric. Concepts of validity and reliability are discussed, as is the…

Descriptors: Graduate Students, Graduate Study, Graduate School Faculty, Scoring Rubrics

Examining the Calibration Process for Raters of the "GRE"® General Test. ETS GRE® Board Research Report. GRE®-19-01. Research Report Series. ETS RR-19-09

Peer reviewed
PDF on ERIC

Download full text

Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019

One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…

Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability

Meta-Assessment of the Assessment Culture: Using a Formal Review to Guide Improvement in Assessment Practices and Document Progress

Peer reviewed
PDF on ERIC

Download full text

Research & Practice in Assessment, 2022

Meta-assessment is a useful strategy to document assessment practices and guide efforts to improve the culture of assessment at an institution. In this study, a meta-assessment of undergraduate and graduate academic program assessment reports evaluated the maturity of assessment work. Assessment reports submitted in the first year (75…

Descriptors: Program Evaluation, Educational Assessment, Meta Analysis, Undergraduate Study

Does the Time between Scoring Sessions Impact Scoring Accuracy? An Evaluation of Constructed-Response Essay Responses on the "GRE"® General Test. Research Report. ETS RR-18-31

Peer reviewed
PDF on ERIC

Download full text

Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018

This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…

Descriptors: Intervals, Scoring, Accuracy, Essay Tests

A Generalizability Theory Study to Examine Sources of Score Variance in Third-Party Evaluations Used in Decision-Making for Graduate School Admissions. ETS GRE® Board Research Report. ETS GRE®-18-03. ETS RR-18-37

Peer reviewed
PDF on ERIC

Download full text

McCaffrey, Daniel F.; Oliveri, Maria Elena; Holtzman, Steven – ETS Research Report Series, 2018

Scores from noncognitive measures are increasingly valued for their utility in helping to inform postsecondary admissions decisions. However, their use has presented challenges because of faking, response biases, or subjectivity, which standardized third-party evaluations (TPEs) can help minimize. Analysts and researchers using TPEs, however, need…

Descriptors: Generalizability Theory, Scores, College Admission, Admission Criteria

The Changing Landscape of Workplace-Based Assessment

Peer reviewed

Direct link

Burch, Vanessa C. – Journal of Applied Testing Technology, 2019

Health professions education has undergone radical changes over the past 100 years. This has necessitated a shift away from education programmes largely focused on testing knowledge and skills using predominantly written examinations. There has been a shift towards programmes which are intentionally designed with the end product in mind, a…

Descriptors: Workplace Learning, Clinical Experience, Situated Learning, Health Personnel

The Impact of Sampling Approach on Population Invariance in Automated Scoring of Essays. Research Report. ETS RR-13-18

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo – ETS Research Report Series, 2013

Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…

Descriptors: Automation, Scoring, Essay Tests, Sampling

Length of Textual Response as a Construct-Irrelevant Response Strategy: The Case of Shell Language. Research Report. ETS RR-13-07

Peer reviewed
PDF on ERIC

Download full text

Bejar, Isaac I.; VanWinkle, Waverely; Madnani, Nitin; Lewis, William; Steier, Michael – ETS Research Report Series, 2013

The paper applies a natural language computational tool to study a potential construct-irrelevant response strategy, namely the use of "shell language." Although the study is motivated by the impending increase in the volume of scoring of students responses from assessments to be developed in response to the Race to the Top initiative,…

Descriptors: Responses, Language Usage, Natural Language Processing, Computational Linguistics

Validation of Assessment Vignettes and Scoring Rubric of Multicultural and International Competency in Faculty Teaching

Peer reviewed

Direct link

Henderson, Sheila J.; Horton, Ruth A.; Saito, Paul K.; Shorter-Gooden, Kumea – Multicultural Learning and Teaching, 2016

The purpose of this research was to develop a new tool for assessing multicultural and international competency in faculty teaching through vignette scenarios of university classroom critical incidents--across disciplines of clinical and forensics psychology, business, and education. Construct and content validity of the initial draft vignettes…

Descriptors: Scoring Rubrics, Critical Incidents Method, Construct Validity, Content Validity

Conversations from the Classroom: Reflections on Feminist Music Therapy Pedagogy in Teaching Music Therapy

Direct link

Hahna, Nicole D. – ProQuest LLC, 2011

Four music therapy educators participated in semi-structured, in-depth interviews as part of a qualitative study. The purpose of this study was to explore the phenomena of feminist pedagogy as experienced by music therapy educators using phenomenological inquiry. The study examined the following research questions: (a) do music therapy educators…

Descriptors: Music Therapy, Teaching Methods, Feminism, Interrater Reliability

Evaluation of the "e-rater"® Scoring Engine for the "GRE"® Issue and Argument Prompts. Research Report. ETS RR-12-02

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012

Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…

Descriptors: Scoring, Test Scoring Machines, Automation, Models

Toward a CoI Population Parameter: The Impact of Unit (Sentence vs. Message) on the Results of Quantitative Content Analysis

Peer reviewed
PDF on ERIC

Download full text

Gorsky, Paul; Caspi, Avner; Blau, Ina; Vine, Yodfat; Billet, Amit – International Review of Research in Open and Distance Learning, 2012

The goal of this study is to further corroborate a hypothesized population parameter for the frequencies of social presence versus the sum of teaching presence and cognitive presence as defined by the community of inquiry model in higher education asynchronous course forums. This parameter has been found across five variables: academic institution…

Descriptors: Foreign Countries, Open Universities, Inquiry, Communities of Practice

Employing Quantitative Models of a Qualitative Admissions Process: Uncovering Hidden Rules, Saving Time, and Reducing Bias.

Peer reviewed

Sadler, Philip M.; Hammerman, James K. – College and University, 1999

A quantitative study modeled the inherently subjective admissions process for 592 graduate school candidates and 72 raters. Logistic regression models were well-fitting and parsimonious, allowing analysis of each stage of the process. Extended committee discussion/deliberation phases were of limited productivity when inter-rater agreement was…

Descriptors: Admission Criteria, Bias, College Admission, Committees

Improving the Reliability of a Direct Writing Skills Assessment.

Download full text

Schwarz, Julie A.; Collins, Michelle L. – 1995

Behaviorally Anchored Rating Scales (BARS) were developed to score responses from a previously designed police written communication test that lacked reliability. Rating scales for each of the 9 dimensions of the test consisted of the scale definition and a 5-point continuum, with the scores of 5, 3, and 1 defined by specified behavioral…

Descriptors: Graduate Students, Graduate Study, Higher Education, Interrater Reliability

The Triple Jump Exercise in Inquiry-Based Learning: A Case Study Showing Directions for Further Research.

Peer reviewed

Feletti, Grahame; Ryan, Greg – Assessment & Evaluation in Higher Education, 1994

The Triple Jump, a procedure for assessing students' problem-based learning, is applied to assessment of inquiry-based learning in a graduate course. Results suggest the need for more research into interrater reliability and other characteristics of the exercise. Some simple strategies for making the instrument cost effective are offered. (MSE)

Descriptors: Evaluation Methods, Graduate Study, Higher Education, Independent Study

Previous Page | Next Page »

Pages: 1 | 2

Wendler, Cathy	2
Arslan, Burcu	1
Bejar, Isaac I.	1
Billet, Amit	1
Blau, Ina	1
Brent J. Goertzen	1
Bridgeman, Brent	1
Burch, Vanessa C.	1
Caspi, Avner	1
Chang, Lei	1
Cline, Frederick	1
Collins, Michelle L.	1
Davey, Tim	1
Feletti, Grahame	1
Finn, Bridgid	1
Garcia, Veronica	1
Glazer, Nancy	1
Gorsky, Paul	1
Hahna, Nicole D.	1
Hammerman, James K.	1
Henderson, Sheila J.	1
Holtzman, Steven	1
Horton, Ruth A.	1
Kaley Klaus	1
Lewis, William	1
More ▼