Publication Date
In 2025 | 37 |
Since 2024 | 160 |
Since 2021 (last 5 years) | 583 |
Since 2016 (last 10 years) | 1218 |
Since 2006 (last 20 years) | 2724 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Researchers | 169 |
Practitioners | 49 |
Teachers | 32 |
Administrators | 8 |
Policymakers | 8 |
Counselors | 4 |
Students | 4 |
Media Staff | 1 |
Location
Turkey | 172 |
Australia | 81 |
Canada | 79 |
China | 68 |
United States | 55 |
Germany | 43 |
Taiwan | 43 |
Japan | 40 |
United Kingdom | 38 |
Iran | 36 |
Spain | 33 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Does not meet standards | 1 |
Hongfei Ye; Jian Xu; Danqing Huang; Meng Xie; Jinming Guo; Junrui Yang; Haiwei Bao; Mingzhi Zhang; Ce Zheng – Discover Education, 2025
This study evaluates Large language models (LLMs)' performance on Chinese Postgraduate Medical Entrance Examination (CPGMEE) as well as the hallucinations produced by LLMs and investigate their implications for medical education. We curated 10 trials of mock CPGMEE to evaluate the performances of 4 LLMs (GPT-4.0, ChatGPT, QWen 2.1 and Ernie 4.0).…
Descriptors: College Entrance Examinations, Foreign Countries, Computational Linguistics, Graduate Medical Education
Dimitrov, Dimiter M.; Atanasov, Dimitar V. – Measurement: Interdisciplinary Research and Perspectives, 2021
This study offers an approach to test equating under the latent D-scoring method (DSM-L) using the nonequivalent groups with anchor tests (NEAT) design. The accuracy of the test equating was examined via a simulation study under a 3 × 3 design by two conditions: group ability at three levels and test difficulty at three levels. The results for…
Descriptors: Equated Scores, Scoring, Test Items, Accuracy
von Davier, Matthias; Bezirhan, Ummugul – Educational and Psychological Measurement, 2023
Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population…
Descriptors: Robustness (Statistics), Test Items, Item Analysis, Goodness of Fit
Gorney, Kylie; Wollack, James A.; Sinharay, Sandip; Eckerly, Carol – Journal of Educational and Behavioral Statistics, 2023
Any time examinees have had access to items and/or answers prior to taking a test, the fairness of the test and validity of test score interpretations are threatened. Therefore, there is a high demand for procedures to detect both compromised items (CI) and examinees with preknowledge (EWP). In this article, we develop a procedure that uses item…
Descriptors: Scores, Test Validity, Test Items, Prior Learning
Huang, Ke; Conroy, Maureen A.; Snyder, Patricia A.; Miller, David; Sutherland, Kevin S. – Assessment for Effective Intervention, 2023
The Social Skills Improvement System-Teacher Rating Scale (SSIS-TRS) has been widely used to measure the social skills and behaviors of children and adolescents that are challenging. Studies examining the psychometric properties of the SSIS-TRS have been conducted, but the dimensional structure and item properties of the SSIS-TRS have not been…
Descriptors: Psychometrics, Integrity, Interpersonal Competence, Rating Scales
Huelmann, Thorben; Debelak, Rudolf; Strobl, Carolin – Journal of Educational Measurement, 2020
This study addresses the topic of how anchoring methods for differential item functioning (DIF) analysis can be used in multigroup scenarios. The direct approach would be to combine anchoring methods developed for two-group scenarios with multigroup DIF-detection methods. Alternatively, multiple tests could be carried out. The results of these…
Descriptors: Test Items, Test Bias, Equated Scores, Item Analysis
Clara Margaça; José Carlos Sánchez-García; Brizeida Hernández Sánchez; Susana Lucas Mangas – International Journal of Sustainability in Higher Education, 2024
Purpose: To protect the environment and society, research on responsible behavior and personal values has increased. Values have been identified as important for understanding and predicting environmental preservation behaviors. The purpose of this study is to analyze the validity and reliability of the Environmental Portrait Value Questionnaire…
Descriptors: Universities, Conservation (Environment), Altruism, Self Concept
Seyda Aydin-Karaca; Mustafa Serdar Köksal; Bilkay Bi – Journal of Psychoeducational Assessment, 2024
This study aimed to develop a parent rating scale (PRSG) for screening children for further identification process in terms of giftedness. The participants of the study were 255 parents of gifted and non-gifted students. The PRSG, consisting of 30 items, was created by consulting parents and reviewing instruments existent in the literature. As…
Descriptors: Rating Scales, Parent Attitudes, Scores, Comparative Analysis
Achmad Rante Suparman; Eli Rohaeti; Sri Wening – Journal on Efficiency and Responsibility in Education and Science, 2024
This study focuses on developing a five-tier chemical diagnostic test based on a computer-based test with 11 assessment categories with an assessment score from 0 to 10. A total of 20 items produced were validated by education experts, material experts, measurement experts, and media experts, and an average index of the Aiken test > 0.70 was…
Descriptors: Chemistry, Diagnostic Tests, Computer Assisted Testing, Credits
Clemens Draxler; Andreas Kurz; Can Gürer; Jan Philipp Nolte – Journal of Educational and Behavioral Statistics, 2024
A modified and improved inductive inferential approach to evaluate item discriminations in a conditional maximum likelihood and Rasch modeling framework is suggested. The new approach involves the derivation of four hypothesis tests. It implies a linear restriction of the assumed set of probability distributions in the classical approach that…
Descriptors: Inferences, Test Items, Item Analysis, Maximum Likelihood Statistics
Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024
To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…
Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement
E. Damiano D'Urso; Jesper Tijmstra; Jeroen K. Vermunt; Kim De Roover – Structural Equation Modeling: A Multidisciplinary Journal, 2024
Measurement invariance (MI) is required for validly comparing latent constructs measured by multiple ordinal self-report items. Non-invariances may occur when disregarding (group differences in) an acquiescence response style (ARS; an agreeing tendency regardless of item content). If non-invariance results solely from neglecting ARS, one should…
Descriptors: Error of Measurement, Structural Equation Models, Construct Validity, Measurement Techniques
Jessica M. Schwartzman; Marissa C. Roth; Ann V. Paterson; Alexandra X. Jacobs; Zachary J. Williams – Autism: The International Journal of Research and Practice, 2024
This study examined the preliminary feasibility, acceptability, and efficacy of an autism-adapted cognitive behavioral therapy for depression in autistic youth, CBT-DAY. Twenty-four autistic youth (11-17 years old) participated in the pilot non-randomized trial including 5 cisgender females, 14 cisgender males, and 5 non-binary youth. Youth…
Descriptors: Autism Spectrum Disorders, Youth, Depression (Psychology), Cognitive Restructuring
Hadler, Patricia; Neuert, Cornelia E.; Ortmanns, Verena; Stiegler, Angelika – Field Methods, 2022
A question asking for respondents' sex is one of the standard sociodemographic characteristics collected in a survey. Until now, it typically consisted of a simple question (e.g., "Are you…?") with two answer categories ("male" and "female"). In 2019, Germany implemented the additional sex designation divers for…
Descriptors: Foreign Countries, Gender Differences, Sex, Surveys
Finch, Holmes – Applied Measurement in Education, 2022
Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous…
Descriptors: Comparative Analysis, Item Response Theory, Item Analysis, Simulation