ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	20
Since 2017 (last 10 years)	71
Since 2007 (last 20 years)	230

Descriptor

Comparative Analysis	327
Interrater Reliability	327
Foreign Countries	84
Correlation	65
Evaluation Methods	53
Statistical Analysis	53
Evaluators	47
Scores	44
Second Language Learning	42
Scoring	41
Student Evaluation	41
English (Second Language)	39
Higher Education	34
Teaching Methods	34
Validity	32
Language Tests	31
Writing Evaluation	31
Second Language Instruction	30
College Students	29
Measures (Individuals)	29
Rating Scales	29
Reliability	27
Elementary School Students	25
Evaluation Criteria	24
Interviews	24
More ▼

Publication Type

Journal Articles	262
Reports - Research	248
Reports - Evaluative	53
Speeches/Meeting Papers	35
Tests/Questionnaires	23
Information Analyses	11
Dissertations/Theses -…	10
Reports - Descriptive	8
Numerical/Quantitative Data	4
Book/Product Reviews	1
Collected Works - Proceedings	1
Collected Works - Serials	1
Guides - Non-Classroom	1
Opinion Papers	1
More ▼

Education Level

Higher Education	77
Postsecondary Education	64
Elementary Education	28
Secondary Education	27
Elementary Secondary Education	17
High Schools	11
Middle Schools	8
Adult Education	6
Early Childhood Education	6
Grade 4	6
Grade 1	5
Preschool Education	5
Grade 2	4
Grade 3	4
Grade 5	4
Intermediate Grades	4
Junior High Schools	4
Grade 11	3
Grade 6	3
Grade 8	3
Grade 10	2
Grade 7	2
Kindergarten	2
Primary Education	2
Grade 12	1
More ▼

Audience

Practitioners	4
Researchers	4
Teachers	2

Location

China	8
Netherlands	7
United Kingdom	7
Australia	6
Turkey	6
United States	6
Florida	5
Iran	5
Taiwan	5
United Kingdom (England)	5
Washington	5
Germany	4
Greece	4
Pennsylvania	4
Arizona	3
Belgium	3
California	3
Canada	3
Finland	3
Georgia	3
Philippines	3
Saudi Arabia	3
Singapore	3
Sweden	3
Tennessee	3
More ▼

Laws, Policies, & Programs

Improving Americas Schools…	1
Individuals with Disabilities…	1
No Child Left Behind Act 2001	1
Temporary Assistance for…	1

What Works Clearinghouse Rating

Does not meet standards

Comparative Analysis X

Showing 16 to 30 of 327 results Save | Export

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Reliable Application of the MATH Taxonomy Sheds Light on Assessment Practices

Peer reviewed

Direct link

Kinnear, George; Bennett, Max; Binnie, Rachel; Bolt, Róisín; Zheng, Yinglan – Teaching Mathematics and Its Applications, 2020

The MATH taxonomy classifies questions according to the mathematical skills required to answer them. It was created to aid the development of more balanced assessments in undergraduate mathematics and has since been used to compare different assessment regimes across school and university. To date, there has been no systematic investigation of the…

Descriptors: Taxonomy, Mathematics Instruction, Teaching Methods, Reliability

Impacts of ChatGPT-Assisted Writing for EFL English Majors: Feasibility and Challenges

Peer reviewed

Direct link

Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024

To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…

Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)

A Comparative Analysis of the "Early Childhood Environment Rating Scale--Revised" and "Early Childhood Environment Rating Scale, Third Edition"

Peer reviewed
PDF on ERIC

Download full text

Direct link

Neitzel, Jennifer; Early, Diane; Sideris, John; LaForrett, Doré; Abel, Michael B.; Soli, Margaret; Davidson, Dawn L.; Haboush-Deloye, Amanda; Hestenes, Linda L.; Jenson, Denise; Johnson, Cindy; Kalas, Jennifer; Mamrak, Angela; Masterson, Marie L.; Mims, Sharon U.; Oya, Patti; Philson, Bobbi; Showalter, Megan; Warner-Richter, Mallory; Kortright Wood, Jill – Journal of Early Childhood Research, 2019

The Early Childhood Environment Rating Scales, including the "Early Childhood Environment Rating Scale--Revised" (Harms et al., 2005) and the "Early Childhood Environment Rating Scale, Third Edition" (Harms et al., 2015) are the most widely used observational assessments in early childhood learning environments. The most recent…

Descriptors: Rating Scales, Early Childhood Education, Educational Quality, Scoring

Analytic or Holistic: A Study of Agreement between Different Grading Models

Peer reviewed
PDF on ERIC

Download full text

Jönsson, Anders; Balan, Andreia – Practical Assessment, Research & Evaluation, 2018

Research on teachers' grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In…

Descriptors: Grading, Models, Reliability, Validity

Applying Generalizability Theory in Language Testing: Comparing Nested and Crossed Scoring Designs in the Assessment of Speaking Skills

Peer reviewed
PDF on ERIC

Download full text

Polat, Murat; Turhan, Nihan Sölpük – International Journal of Curriculum and Instruction, 2021

Scoring language learners' speaking skills is open to a number of measurement errors since raters' personal judgements could involve in the process. Different grading designs in which raters score a student's whole speaking skills or a specific dimension of the speaking performance could be settled to control and minimize the amount of the error…

Descriptors: Language Tests, Scoring, Speech Communication, State Universities

Examining the Interrater Reliability between Self- and Teacher Assessment of Students' Oral Performances

Peer reviewed
PDF on ERIC

Download full text

Manzano, Dexter L. – International Journal of Language Testing, 2022

The increasing popularity of self-assessment prompted several scholars to investigate its effectiveness and accuracy in relation to teacher assessment. However, most of these studies focused only on the consistency estimate perspective. Thus, the current study investigated the interrater reliability between self- and teacher assessment of…

Descriptors: Oral Language, Self Evaluation (Individuals), College Students, Interrater Reliability

Comparing Machine and Human Reviewers to Evaluate the Risk of Bias in Randomized Controlled Trials

Peer reviewed

Direct link

Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020

Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…

Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

Rubric for Assessing Thinking Skills in Free-Response Exam Problems

Peer reviewed

Direct link

Al-Salmani, Fatema; Thacker, Beth – Physical Review Physics Education Research, 2021

We designed a rubric to assess free-response exam problems in order to compare thinking skills evidenced in exams in classes taught by different pedagogies. The rubric was designed based on Bloom's taxonomy and then used to code exam problems. We have analyzed historical and recent exam problems in both algebra-based and calculus-based exams. In…

Descriptors: Inquiry, Thinking Skills, Scoring Rubrics, Algebra

Monitoring the Performance of Human and Automated Scores for Spoken Responses

Peer reviewed

Direct link

Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018

As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…

Descriptors: Automation, Scoring, Speech Tests, Language Tests

Metrics for Discrete Student Models: Chance Levels, Comparisons, and Use Cases

Peer reviewed
PDF on ERIC

Download full text

Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018

Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…

Descriptors: Models, Comparative Analysis, Prediction, Probability

Students' Use of Formalisations for Improved Logical Reasoning

Peer reviewed

Direct link

Bronkhorst, Hugo; Roorda, Gerrit; Suhre, Cor; Goedhart, Martin – Research in Mathematics Education, 2022

Logical reasoning as part of critical thinking is becoming more and more important to prepare students for their future life in society, work, and study. This article presents the results of a quasi-experimental study with a pre-test-post-test control group design focusing on the effective use of formalisations to support logical reasoning. The…

Descriptors: Mathematics Instruction, Teaching Methods, Logical Thinking, Critical Thinking

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Disentangling Objective Characteristics of Learning Situations from Subjective Perceptions Thereof, Using an Experience Sampling Method Design

Peer reviewed
PDF on ERIC

Download full text

Moeller, Julia; Viljaranta, Jaana; Kracke, Bärbel; Dietrich, Julia – Frontline Learning Research, 2020

This article proposes a study design developed to disentangle the objective characteristics of a learning situation from individuals' subjective perceptions of that situation. The term objective characteristics refers to the agreement across students, whereas subjective perceptions refers to inter-individual heterogeneity. We describe a novel…

Descriptors: Student Attitudes, College Students, Lecture Method, Student Interests

Comparative Judgement: Assess Student Production without Absolute Judgements

Peer reviewed
PDF on ERIC

Download full text

Sumner, Josh – Research-publishing.net, 2021

Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…

Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making

« Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 22

Journal of Speech, Language,…	11
ProQuest LLC	10
Language Testing	6
Journal of Autism and…	5
Assessment & Evaluation in…	4
Educational and Psychological…	4
English Language Teaching	4
Language Assessment Quarterly	4
Advances in Health Sciences…	3
Behavior Modification	3
Creativity Research Journal	3
ETS Research Report Series	3
Educational Sciences: Theory…	3
Journal of Applied Behavior…	3
Online Submission	3
Research Synthesis Methods	3
Academic Medicine	2
American Journal of…	2
Applied Measurement in…	2
Assessing Writing	2
Autism: The International…	2
Clinical Linguistics &…	2
Developmental Psychology	2
Early Child Development and…	2
Education and Training in…	2
More ▼

Coniam, David	3
Lunz, Mary E.	3
Attali, Yigal	2
Beach, Kristen D.	2
Bocian, Kathleen M.	2
Bothe, Anne K.	2
Chavez, Oscar	2
Derby, K. Mark	2
Gillan, Nicola	2
Grouws, Douglas A.	2
Hestenes, Linda L.	2
Incikabi, Lutfi	2
Jones, Ian	2
Kokkinaki, Theano	2
McLaughlin, T. F.	2
Mims, Sharon U.	2
Myford, Carol M.	2
Nakamura, Yuji	2
O'Connor, Rollanda E.	2
O'Neill, Thomas R.	2
Papick, Ira	2
Wind, Stefanie A.	2
Zayac, Ryan M.	2
Abbott, Robert	1
More ▼

Test of English as a Foreign…	5
Autism Diagnostic Observation…	4
Woodcock Johnson Tests of…	4
Dynamic Indicators of Basic…	3
Early Childhood Environment…	2
National Assessment of…	2
Peabody Picture Vocabulary…	2
ACT Assessment	1
Adaptive Behavior Scale	1
Expressive One Word Picture…	1
Georgia Criterion Referenced…	1
Graduate Management Admission…	1
Kaufman Brief Intelligence…	1
MacArthur Bates Communicative…	1
Mean Length of Utterance	1
Multifactor Leadership…	1
NEO Personality Inventory	1
Neale Analysis of Reading…	1
Obsessive Compulsive Scale	1
Pediatric Evaluation of…	1
Praxis Series	1
Raven Progressive Matrices	1
SAT (College Admission Test)	1
Vineland Adaptive Behavior…	1
Wechsler Adult Intelligence…	1
More ▼