ERIC - Search Results

Publication Date

In 2026	0
Since 2025	14
Since 2022 (last 5 years)	54
Since 2017 (last 10 years)	110
Since 2007 (last 20 years)	198

Descriptor

Computer Software	223
Reliability	138
Foreign Countries	90
Interrater Reliability	51
Validity	50
Statistical Analysis	49
Comparative Analysis	44
Correlation	43
Test Reliability	43
Second Language Learning	33
Teaching Methods	32
Models	30
Artificial Intelligence	28
English (Second Language)	28
Evaluation Methods	28
Accuracy	27
Student Attitudes	26
Scores	25
Evaluators	23
Test Validity	23
Undergraduate Students	23
Measures (Individuals)	22
Second Language Instruction	22
Scoring	21
Computer Assisted Testing	20
More ▼

Publication Type

Reports - Research	223
Journal Articles	198
Tests/Questionnaires	21
Speeches/Meeting Papers	19
Information Analyses	6
Numerical/Quantitative Data	2
Dissertations/Theses -…	1
Multilingual/Bilingual…	1
Opinion Papers	1

Education Level

Higher Education	72
Postsecondary Education	65
Secondary Education	26
Elementary Education	23
Elementary Secondary Education	11
Middle Schools	10
Early Childhood Education	7
High Schools	7
Junior High Schools	6
Primary Education	5
Grade 4	4
Intermediate Grades	4
Grade 2	3
Grade 5	3
Kindergarten	3
Grade 7	2
Preschool Education	2
Grade 1	1
Grade 10	1
Grade 11	1
Grade 12	1
Grade 3	1
Grade 8	1
Grade 9	1
Two Year Colleges	1
More ▼

Audience

Researchers

Location

Turkey	14
Iran	6
Saudi Arabia	5
Canada	4
China	4
Malaysia	4
Germany	3
India	3
Jordan	3
Nigeria	3
Pakistan	3
Philippines	3
Australia	2
Egypt	2
Finland	2
Indonesia	2
Japan	2
Kenya	2
Netherlands	2
South Africa	2
South Korea	2
Sweden	2
Afghanistan	1
Arizona	1
Canada (Vancouver)	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	3
Graduate Record Examinations	2
Peabody Picture Vocabulary…	2
Coopersmith Self Esteem…	1
Dale Chall Readability Formula	1
Expressive One Word Picture…	1
Flesch Kincaid Grade Level…	1
Flesch Reading Ease Formula	1
Fry Readability Formula	1
International English…	1
Mean Length of Utterance	1
National Assessment of…	1
Students Evaluation of…	1
Torrance Tests of Creative…	1
Trends in International…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 223 results Save | Export

Decision-Making Efficiency with Aided Information: The Impact of Automation Reliability and Task Difficulty

Peer reviewed

Direct link

Hanshu Zhang; Ran Zhou; Cheng-You Cheng; Sheng-Hsu Huang; Ming-Hui Cheng; Cheng-Ta Yang – Cognitive Research: Principles and Implications, 2025

Although it is commonly believed that automation aids human decision-making, conflicting evidence raises questions about whether individuals would gain greater advantages from automation in difficult tasks. Our study examines the combined influence of task difficulty and automation reliability on aided decision-making. We assessed decision…

Descriptors: Task Analysis, Difficulty Level, Decision Making, Automation

How Reliable Is Assessment of Children's Sentence Comprehension Using a Self-Directed App? A Comparison of Supported versus Independent Use

Peer reviewed

Direct link

Pauline Frizelle; Ana Buckley; Tricia Biancone; Anna Ceroni; Darren Dahly; Paul Fletcher; Dorothy V. M. Bishop; Cristina McKean – Journal of Child Language, 2024

This study reports on the feasibility of using the Test of Complex Syntax- Electronic (TECS-E), as a self-directed app, to measure sentence comprehension in children aged 4 to 5 ½ years old; how testing apps might be adapted for effective independent use; and agreement levels between face-to-face supported computerized and independent computerized…

Descriptors: Language Processing, Computer Software, Language Tests, Syntax

Modeling Directional Testlet Effects on Multiple Open-Ended Questions

Peer reviewed

Direct link

Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025

Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…

Descriptors: Models, Test Items, Educational Assessment, Scores

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

A Closed-Form Alternative for Estimating [omega] Reliability under Unidimensionality

Peer reviewed

Direct link

Hancock, Gregory R.; An, Ji – Measurement: Interdisciplinary Research and Perspectives, 2020

As an alternative to Cronbach's [alpha] for estimating scale reliability, McDonald's [omega] has attracted increased attention within the methodological community for its less stringent measurement assumptions. Notwithstanding, [omega] is still seldom used by practitioners, likely due to its unavailability in popular software packages (e.g., SPSS)…

Descriptors: Evaluation, Alternative Assessment, Reliability, Test Reliability

Combined Logistic and Confined Exponential Growth Models: Estimation Using SEM Software

Peer reviewed

Direct link

Phillip K. Wood – Structural Equation Modeling: A Multidisciplinary Journal, 2024

The logistic and confined exponential curves are frequently used in studies of growth and learning. These models, which are nonlinear in their parameters, can be estimated using structural equation modeling software. This paper proposes a single combined model, a weighted combination of both models. Mplus, Proc Calis, and lavaan code for the model…

Descriptors: Structural Equation Models, Computation, Computer Software, Weighted Scores

Application of Model Averaging for Measurement in the Presence of Unknown Familiarization Phase or Fatigue Phase

Peer reviewed

Direct link

Steven Kim; Stephanie Lara-Sotelo; Eric Martin – Measurement in Physical Education and Exercise Science, 2024

A number of familiarization trials are needed for reliable measurement, particularly for inexperienced subjects. Researchers have studied and developed familiarization protocols that vary by exercise and study population. The pace of familiarization and fatigue may be an individual-level characteristic, so a population-level protocol may not fit…

Descriptors: Familiarity, Physical Education, Fatigue (Biology), Reliability

Can Large Language Models Replace Humans in Systematic Reviews? Evaluating GPT-4's Efficacy in Screening and Extracting Data from Peer-Reviewed and Grey Literature in Multiple Languages

Peer reviewed

Direct link

Qusai Khraisha; Sophie Put; Johanna Kappenberg; Azza Warraitch; Kristin Hadfield – Research Synthesis Methods, 2024

Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained…

Descriptors: Peer Evaluation, Research Reports, Artificial Intelligence, Computer Software

Simple Techniques to Bypass GenAI Text Detectors: Implications for Inclusive Education

Peer reviewed

Direct link

Mike Perkins; Jasper Roe; Binh H. Vu; Darius Postma; Don Hickerson; James McGaughran; Huy Q. Khuat – International Journal of Educational Technology in Higher Education, 2024

This study investigates the efficacy of six major Generative AI (GenAI) text detectors when confronted with machine-generated content modified to evade detection (n = 805). We compare these detectors to assess their reliability in identifying AI-generated text in educational settings, where they are increasingly used to address academic integrity…

Descriptors: Artificial Intelligence, Inclusion, Computer Software, Word Processing

Coherence-Based Automatic Short Answer Scoring Using Sentence Embedding

Peer reviewed

Direct link

Dadi Ramesh; Suresh Kumar Sanampudi – European Journal of Education, 2024

Automatic essay scoring (AES) is an essential educational application in natural language processing. This automated process will alleviate the burden by increasing the reliability and consistency of the assessment. With the advances in text embedding libraries and neural network models, AES systems achieved good results in terms of accuracy.…

Descriptors: Scoring, Essays, Writing Evaluation, Memory

Perceptual and Acoustic Assessment of Strain Using Synthetically Modified Voice Samples

Peer reviewed

Direct link

Park, Yeonggwang; Cádiz, Manuel Díaz; Nagle, Kathleen F.; Stepp, Cara E. – Journal of Speech, Language, and Hearing Research, 2020

Purpose: Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Method: Stimuli were created using recordings of…

Descriptors: Acoustics, Audio Equipment, Auditory Perception, Correlation

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Claude, ChatGPT, Copilot, and Gemini Performance versus Students in Different Topics of Neuroscience

Peer reviewed

Direct link

Volodymyr Mavrych; Ahmed Yaqinuddin; Olena Bolgova – Advances in Physiology Education, 2025

Despite extensive studies on large language models and their capability to respond to questions from various licensed exams, there has been limited focus on employing chatbots for specific subjects within the medical curriculum, specifically medical neuroscience. This research compared the performances of Claude 3.5 Sonnet (Anthropic), GPT-3.5 and…

Descriptors: Artificial Intelligence, Computer Software, Neurosciences, Medical Education

Analysis of UX Elements in Educational Applications for Young Children and Implementation of ISO/IEC 25010 Quality Standards

Peer reviewed

Direct link

Hae Sun Jung; Haein Lee; Keon Chul Park – SAGE Open, 2025

This study investigates user experience (UX) priorities in early childhood education applications by analyzing Korean-language user reviews using Bidirectional Encoder Representations from Transformers topic modeling (BERTopic). Eighteen latent topics were extracted and systematically mapped to the eight software quality characteristics defined by…

Descriptors: Early Childhood Education, Computer Uses in Education, Computer Software, Usability

Project Development for Blood Bank Application and Convertor for Software Testing

Peer reviewed
PDF on ERIC

Download full text

Rosziati Ibrahim; Mizani Mohamad Madon; Zhiang Yue Lee; Piraviendran A/L Rajendran; Jahari Abdul Wahab; Faaizah Shahbodin – International Society for Technology, Education, and Science, 2023

This paper discusses the steps involve in project development for developing the mobile application, namely Blood Bank Application and developing the convertor for software testing. The project development is important for Computer Science students for them to learn the important steps in developing the application and testing the reliability of…

Descriptors: Program Administration, Educational Technology, Computer Software, Testing

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 15

Educational Research and…	6
Online Submission	6
English Language Teaching	5
Grantee Submission	5
Journal of Education and…	5
Journal of Speech, Language,…	5
ETS Research Report Series	4
Education and Information…	4
Advances in Physiology…	3
Contemporary Educational…	3
Educational and Psychological…	3
International Education…	3
International Educational…	3
Journal of Information…	3
Language Testing	3
Research Synthesis Methods	3
Advances in Language and…	2
Australasian Journal of…	2
British Journal of…	2
Contemporary Issues in…	2
Educational Sciences: Theory…	2
Educational Technology &…	2
Electronic Journal of…	2
IEEE Transactions on Learning…	2
International Journal of…	2
More ▼

An, Ji	2
Bahreini, Kiavash	2
Bodur, Yasar	2
Hancock, Gregory R.	2
Kimball, James C.	2
Lenhard, Wolfgang	2
McNamara, Danielle S.	2
Mott, Michael S.	2
Mustafa Taktak	2
Nadolski, Rob	2
Seifried, Eva	2
Spinath, Birgit	2
Unal, Aslihan	2
Unal, Zafer	2
Wang, Wen-Chung	2
Westera, Wim	2
A. K. Somasekhar	1
Abalaka, Eneojo N.	1
Abu Bakar, Hatinah	1
Abu Naba'h, Abdallah M.	1
Adéla Vrtková	1
Ahmed Alkhateeb	1
Ahmed Yaqinuddin	1
Ahmed, Tamim	1
More ▼