Publication Date
| In 2026 | 0 |
| Since 2025 | 59 |
| Since 2022 (last 5 years) | 385 |
| Since 2017 (last 10 years) | 828 |
| Since 2007 (last 20 years) | 1342 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 195 |
| Teachers | 161 |
| Researchers | 93 |
| Administrators | 50 |
| Students | 34 |
| Policymakers | 15 |
| Parents | 12 |
| Counselors | 2 |
| Community | 1 |
| Media Staff | 1 |
| Support Staff | 1 |
| More ▼ | |
Location
| Canada | 62 |
| Turkey | 59 |
| Germany | 40 |
| Australia | 36 |
| United Kingdom | 36 |
| Japan | 35 |
| China | 33 |
| United States | 32 |
| California | 25 |
| Iran | 25 |
| United Kingdom (England) | 25 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Emma Pritchard-Rowe; Carmen de Lemos; Katie Howard; Jenny Gibson – Autism: The International Journal of Research and Practice, 2025
Play is often included in autism diagnostic assessments. These tend to focus on 'deficits' and non-autistic interpretation of observable behaviours. In contrast, a neurodiversity-affirmative assessment approach involves centring autistic perspectives and focusing on strengths, differences and needs. Accordingly, this study was designed to focus on…
Descriptors: Foreign Countries, Adults, Autism Spectrum Disorders, Play
Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025
This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…
Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis
Jacqueline E. McLaughlin; Kathryn Morbitzer; Margaux Meilhac; Natalie Poupart; Rebekah L. Layton; Michael B. Jarstfer – Studies in Graduate and Postdoctoral Education, 2024
Purpose: While known by many names, qualifying exams function as gatekeepers to graduate student advancement to PhD candidacy, yet there has been little formal study on best qualifying exam practices particularly in biomedical and related STEM PhD programs. The purpose of this study is to examine the current state of qualifying exams through an…
Descriptors: Doctoral Programs, Best Practices, STEM Education, Biological Sciences
Ulrike Padó; Yunus Eryilmaz; Larissa Kirschner – International Journal of Artificial Intelligence in Education, 2024
Short-Answer Grading (SAG) is a time-consuming task for teachers that automated SAG models have long promised to make easier. However, there are three challenges for their broad-scale adoption: A technical challenge regarding the need for high-quality models, which is exacerbated for languages with fewer resources than English; a usability…
Descriptors: Grading, Automation, Test Format, Computer Assisted Testing
Jing Miao; Yi Cao; Michael E. Walker – ETS Research Report Series, 2024
Studies of test score comparability have been conducted at different stages in the history of testing to ensure that test results carry the same meaning regardless of test conditions. The expansion of at-home testing via remote proctoring sparked another round of interest. This study uses data from three licensure tests to assess potential mode…
Descriptors: Testing, Test Format, Computer Assisted Testing, Home Study
van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2022
The current literature on test equating generally defines it as the process necessary to obtain score comparability between different test forms. The definition is in contrast with Lord's foundational paper which viewed equating as the process required to obtain comparability of measurement scale between forms. The distinction between the notions…
Descriptors: Equated Scores, Test Items, Scores, Probability
Rebecca Napolitano; Ryan Solnosky; Wesley Reinhart – Journal of Civil Engineering Education, 2025
This study examines the impact of changes in exam modalities on the performance and experiences of architectural engineering students in a domain-specific data science class. Specifically, the number and duration of exams (and thereby the amount of content on each) and setting in which the students took the exams in changed among the three years…
Descriptors: Data Science, Engineering Education, Architectural Education, Student Evaluation
Yusuf Oc; Hela Hassen – Marketing Education Review, 2025
Driven by technological innovations, continuous digital expansion has transformed fundamentally the landscape of modern higher education, leading to discussions about evaluation techniques. The emergence of generative artificial intelligence raises questions about reliability and academic honesty regarding multiple-choice assessments in online…
Descriptors: Higher Education, Multiple Choice Tests, Computer Assisted Testing, Electronic Learning
Huiming Ding; Matt Homer – Advances in Health Sciences Education, 2025
Summative assessments are often underused for feedback, despite them being rich with data of students' applied knowledge and clinical and professional skills. To better inform teaching and student support, this study aims to gain insights from summative assessments through profiling students' performance patterns and identify those students…
Descriptors: Summative Evaluation, Profiles, Statistical Analysis, Outcomes of Education
Ben Backes; James Cowan – Grantee Submission, 2024
We investigate two research questions using a recent statewide transition from paper to computer-based testing: first, the extent to which test mode effects found in prior studies can be eliminated in large-scale administration; and second, the degree to which online and paper assessments offer different information about underlying student…
Descriptors: Computer Assisted Testing, Test Format, Differences, Academic Achievement
Ben Backes; James Cowan – Applied Measurement in Education, 2024
We investigate two research questions using a recent statewide transition from paper to computer-based testing: first, the extent to which test mode effects found in prior studies can be eliminated; and second, the degree to which online and paper assessments offer different information about underlying student ability. We first find very small…
Descriptors: Computer Assisted Testing, Test Format, Differences, Academic Achievement
Chunyan Liu; Raja Subhiyah; Richard A. Feinberg – Applied Measurement in Education, 2024
Mixed-format tests that include both multiple-choice (MC) and constructed-response (CR) items have become widely used in many large-scale assessments. When an item response theory (IRT) model is used to score a mixed-format test, the unidimensionality assumption may be violated if the CR items measure a different construct from that measured by MC…
Descriptors: Test Format, Response Style (Tests), Multiple Choice Tests, Item Response Theory
Mustafa Ilhan; Nese Güler; Gülsen Tasdelen Teker; Ömer Ergenekon – International Journal of Assessment Tools in Education, 2024
This study aimed to examine the effects of reverse items created with different strategies on psychometric properties and respondents' scale scores. To this end, three versions of a 10-item scale in the research were developed: 10 positive items were integrated in the first form (Form-P) and five positive and five reverse items in the other two…
Descriptors: Test Items, Psychometrics, Scores, Measures (Individuals)
Natalja Menold; Vera Toepoel – Sociological Methods & Research, 2024
Research on mixed devices in web surveys is in its infancy. Using a randomized experiment, we investigated device effects (desktop PC, tablet and mobile phone) for six response formats and four different numbers of scale points. N = 5,077 members of an online access panel participated in the experiment. An exact test of measurement invariance and…
Descriptors: Online Surveys, Handheld Devices, Telecommunications, Test Reliability
Jianbin Fu; Patrick C. Kyllonen; Xuan Tan – Measurement: Interdisciplinary Research and Perspectives, 2024
Users of forced-choice questionnaires (FCQs) to measure personality commonly assume statement parameter invariance across contexts -- between Likert and forced-choice (FC) items and between different FC items that share a common statement. In this paper, an empirical study was designed to check these two assumptions for an FCQ assessment measuring…
Descriptors: Measurement Techniques, Questionnaires, Personality Measures, Interpersonal Competence

Peer reviewed
Direct link
