NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 76 to 90 of 1,943 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Matthews, Evan L.; Horvat, Fiona M.; Phillips, David A. – Measurement in Physical Education and Exercise Science, 2022
The YMCA step test uses a prescribed step height which is difficult in a telehealth setting. Examine a modification of the YMCA step test allowing for the use of preexisting in-home objects of variable height as the "step" in a virtual environment. Young healthy participants (n = 40) performed step tests with a small and large object of…
Descriptors: Physical Fitness, Metabolism, Tests, Measurement
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Osama Koraishi – Language Teaching Research Quarterly, 2024
This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Jane Batamuliza; Gonzague Habinshuti; Jean Baptiste Nkurunziza – Journal of Technology and Science Education, 2024
This current study presents the effects of interactive computer simulations on students' performance and concept retention in the unit of chemical reactions. Purposive sampling was used to select four schools with a sample population of 320. The Achievement test on chemical reactions was developed, validated, and checked for reliability. The…
Descriptors: Chemistry, Science Instruction, Teaching Methods, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Faming; Wang, Yehui; Liu, Yaping; Leung, Shing On – Scandinavian Journal of Educational Research, 2023
The importance of the opportunity to learn (OTL) for mathematics achievement has been extensively researched. However, there were still unanswered questions regarding OTL's measurement, analytical level, and relationship with motivational beliefs. To fill in the gaps, we aimed to (1) scrutinize the reliability and validity of OTL, (2) investigate…
Descriptors: International Assessment, Foreign Countries, Achievement Tests, Secondary School Students
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
De Raadt, Alexandra; Warrens, Matthijs J.; Bosker, Roel J.; Kiers, Henk A. L. – Educational and Psychological Measurement, 2019
Cohen's kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen's kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data…
Descriptors: Interrater Reliability, Data, Statistical Analysis, Statistical Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Benton, Tom – Research Matters, 2021
Computer adaptive testing is intended to make assessment more reliable by tailoring the difficulty of the questions a student has to answer to their level of ability. Most commonly, this benefit is used to justify the length of tests being shortened whilst retaining the reliability of a longer, non-adaptive test. Improvements due to adaptive…
Descriptors: Risk, Item Response Theory, Computer Assisted Testing, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Purwanto; Hidayah, Niswatul; Wagistina, Satti – International Journal of Educational Methodology, 2023
Learning geography in Indonesia philosophically aims to develop spatial literacy. Students must improve spatial literacy to form reasoning skills and apply spatial concepts in real life. Applying Gersmehl's spatial learning can improve students' spatial literacy through syntax arranged based on spatial aspects. The use of google earth helps…
Descriptors: Spatial Ability, Natural Disasters, Geography Instruction, Teaching Methods
Peer reviewed Peer reviewed
PDF on ERIC Download full text
David Bell; Vikki O'Neill; Vivienne Crawford – Practitioner Research in Higher Education, 2023
We compared the influence of open-book extended duration versus closed book time-limited format on reliability and validity of written assessments of pharmacology learning outcomes within our medical and dental courses. Our dental cohort undertake a mid-year test (30xfree-response short answer to a question, SAQ) and end-of-year paper (4xSAQ,…
Descriptors: Undergraduate Students, Pharmacology, Pharmaceutical Education, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
van der Scheer, Emmelien A.; Bijlsma, Hannah J. E.; Glas, Cees A. W. – School Effectiveness and School Improvement, 2019
A Bayesian IRT-model approach was used to investigate the validity and reliability of student perceptions of teaching quality. Furthermore, the student perceptions were compared with ratings of teaching quality by external observers. Grade 4 students (n = 675) filled out a questionnaire that was used to measure their opinions about the lessons of…
Descriptors: Student Attitudes, Validity, Interrater Reliability, Correlation
Jason E. Stone – ProQuest LLC, 2021
Personalized learning tracks within MOOCs remain underdeveloped. Despite MOOCs possessing tremendous potential for personalized learning, little individualization of the MOOC has occurred. Some students look at MOOCs as a textbook, others as a formal course, and others as an opportunity to socialize. Understanding student enrollment needs are a…
Descriptors: MOOCs, Student Motivation, Independent Study, Rating Scales
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Colvin, Kimberly F.; Gorgun, Guher – Practical Assessment, Research & Evaluation, 2020
This study compares a scale, the Rosenberg Self-Esteem Scale, that was administered with four response categories to versions of the same scale that were administered with six and eight response categories. Respondents were randomly assigned to take one of the three versions of RSES. A rating scale utility analysis was conducted on all three…
Descriptors: Psychometrics, Measures (Individuals), Self Concept Measures, Self Esteem
Peer reviewed Peer reviewed
Direct linkDirect link
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  130