ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	11

Descriptor

Comparative Analysis	16
Interrater Reliability	16
Models	16
Foreign Countries	5
Computer Software	4
Correlation	4
Evaluation Methods	4
Language Tests	4
Second Language Learning	4
Data Analysis	3
English (Second Language)	3
Measurement Techniques	3
Problem Solving	3
Scores	3
Scoring	3
Secondary School Students	3
Artificial Intelligence	2
Automation	2
Career Readiness	2
Classification	2
Cognitive Processes	2
College Students	2
Communication Skills	2
Communities of Practice	2
Computer Assisted Testing	2
More ▼

Source

ETS Research Report Series	2
American Journal of Distance…	1
Child Welfare	1
Clinical Linguistics &…	1
Early Childhood Research…	1
Frontline Learning Research	1
International Association for…	1
International Educational…	1
Journal of Learning Analytics	1
Journal of Science Education…	1
Language Testing	1
Practical Assessment,…	1
More ▼

Publication Type

Reports - Research	15
Journal Articles	11
Speeches/Meeting Papers	4
Collected Works - Proceedings	1
Information Analyses	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	3
Elementary Secondary Education	1
Secondary Education	1

Audience

Location

Germany	2
Asia	1
Australia	1
Brazil	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
Greece	1
Hawaii	1
Ireland	1
Israel	1
Italy	1
Japan	1
Kazakhstan	1
Netherlands	1
Norway	1
Ohio	1
Pakistan	1
Pennsylvania	1
Philippines	1
Portugal	1
Singapore	1
South Korea	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Praxis Series

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Analytic or Holistic: A Study of Agreement between Different Grading Models

Peer reviewed
PDF on ERIC

Download full text

Jönsson, Anders; Balan, Andreia – Practical Assessment, Research & Evaluation, 2018

Research on teachers' grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In…

Descriptors: Grading, Models, Reliability, Validity

Metrics for Discrete Student Models: Chance Levels, Comparisons, and Use Cases

Peer reviewed
PDF on ERIC

Download full text

Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018

Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…

Descriptors: Models, Comparative Analysis, Prediction, Probability

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Disentangling Objective Characteristics of Learning Situations from Subjective Perceptions Thereof, Using an Experience Sampling Method Design

Peer reviewed
PDF on ERIC

Download full text

Moeller, Julia; Viljaranta, Jaana; Kracke, Bärbel; Dietrich, Julia – Frontline Learning Research, 2020

This article proposes a study design developed to disentangle the objective characteristics of a learning situation from individuals' subjective perceptions of that situation. The term objective characteristics refers to the agreement across students, whereas subjective perceptions refers to inter-individual heterogeneity. We describe a novel…

Descriptors: Student Attitudes, College Students, Lecture Method, Student Interests

Evaluation of "e-rater"® for the "Praxis I"®Writing Test. Research Report. ETS RR-15-03

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015

Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…

Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring

Enhancing Cognitive Presence in Online Case Discussions with Questions Based on the Practical Inquiry Model

Peer reviewed

Direct link

Sadaf, Ayesha; Olesova, Larisa – American Journal of Distance Education, 2017

The researchers in this study examined the influence of questions designed with the Practical Inquiry Model (PIM), compared with the regular (playground) questions, on students' levels of cognitive presence in online discussions. Students' discussion postings were collected and categorized according to the four levels of cognitive presence:…

Descriptors: Graduate Students, Masters Programs, Cognitive Processes, Web Based Instruction

Identifying 21st Century STEM Competencies Using Workplace Data

Peer reviewed

Direct link

Jang, Hyewon – Journal of Science Education and Technology, 2016

Gaps between science, technology, engineering, and mathematics (STEM) education and required workplace skills have been identified in industry, academia, and government. Educators acknowledge the need to reform STEM education to better prepare students for their future careers. We pursue this growing interest in the skills needed for STEM…

Descriptors: STEM Education, Work Environment, Interrater Reliability, Engineering Education

SLA Developmental Stages and Teachers' Assessment of Written French: Exploring Direkt Profil as a Diagnostic Assessment Tool

Peer reviewed

Direct link

Granfeldt, Jonas; Ågren, Malin – Language Testing, 2014

One core area of research in Second Language Acquisition is the identification and definition of developmental stages in different L2s. For L2 French, Bartning and Schlyter (2004) presented a model of six morphosyntactic stages of development in the shape of grammatical profiles. The model formed the basis for the computer program Direkt Profil…

Descriptors: Second Language Learning, Language Tests, French, Language Teachers

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Reliability and Confidence in Using a Paired Comparison Paradigm in Perceptual Voice Quality Evaluation

Peer reviewed

Direct link

Yiu, Edwin M.-L.; Chan, Karen M. K.; Mok, Rosa S.-M. – Clinical Linguistics & Phonetics, 2007

One of the ways to improve the reliability in perceptual voice quality rating is to provide listeners with external anchors. A paired comparison matching paradigm using synthesized Cantonese voice stimuli that covered a range of rough and breathy qualities were used to investigate the rating reliability. Twenty-five speech pathology students rated…

Descriptors: Data Analysis, Measures (Individuals), Stimuli, Models

Risk Assessment in Child Protective Services: Consensus and Actuarial Model Reliability.

Peer reviewed

Baird, Christopher; Wagner, Dennis; Healy, Theresa; Johnson, Kristen – Child Welfare, 1999

Compared reliability of three widely used child protective service risk-assessment models (one actuarial, two consensus based). Found that, although no system approached 100% interrater reliability, raters employing the actuarial model made consistent estimates of risk for a high percentage of cases they assessed. Interrater reliability for the…

Descriptors: At Risk Persons, Child Welfare, Children, Comparative Analysis

Evaluating the Efficacy of Rater Self-Training.

Download full text

Kenyon, Dorry; Stansfield, Charles W. – 1993

This paper examines whether individuals who train themselves to score a performance assessment will rate acceptably when compared to known standards. Research on the efficacy of rater self-training materials developed by the Center for Applied Linguistics for the Texas Oral Proficiency Test (TOPT) is examined. Rater self-materials are described…

Descriptors: Bilingual Education, Comparative Analysis, Evaluators, Individual Characteristics

A Comparison of the Graded Response and Partial Credit Models for Assessing Writing Ability.

Download full text

De Ayala, R. J.; And Others – 1989

The graded response (GR) model of Samejima (1969) and the partial credit model (PC) of Masters (1982) were fitted to identical writing samples that were holistically scored. The performance and relative benefits of each model were then evaluated. Writing samples were both expository and narrative. Data were from statewide assessments of secondary…

Descriptors: Comparative Analysis, Essay Tests, Holistic Evaluation, Interrater Reliability

Exploring Rater Behaviour with Rasch Techniques.

Download full text

McNamara, T. F.; Adams, R. J. – 1991

A preliminary study is reported of the use of new multifaceted Rasch measurement mechanisms for investigating rater characteristics in language testing. Ratings from four judges of scripts from 50 candidates taking the International English Language Testing System test, a test of English for Academic Purposes, are analyzed. The analysis…

Descriptors: Comparative Analysis, English (Second Language), Foreign Countries, Interrater Reliability

Family Day Care: A Theoretical Basis for Improving Quality.

Peer reviewed

Fischer, Jan Lockwood; Krause Eheart, Brenda – Early Childhood Research Quarterly, 1991

Providers' demographic characteristics, training, support networks, business practices, and stability of services were examined relative to their caregiving practices. Results from a schematic model approach suggest correlations between some of these factors and variances in ratings of caregiver practices. (LB)

Descriptors: Behavior Rating Scales, Child Caregivers, Comparative Analysis, Data Analysis

Previous Page | Next Page »

Pages: 1 | 2

Adams, R. J.	1
Baird, Christopher	1
Balan, Andreia	1
Bosch, Nigel	1
Breyer, F. Jay	1
Chan, Karen M. K.	1
De Ayala, R. J.	1
Dietrich, Julia	1
Fischer, Jan Lockwood	1
Granfeldt, Jonas	1
Healy, Theresa	1
Jang, Hyewon	1
Johnson, Kristen	1
Jönsson, Anders	1
Kenyon, Dorry	1
Kracke, Bärbel	1
Krause Eheart, Brenda	1
Lorenz, Florian	1
McNamara, T. F.	1
Moeller, Julia	1
Mok, Rosa S.-M.	1
Olesova, Larisa	1
Paquette, Luc	1
Piech, Chris	1
More ▼