NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 9,520 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Haeju Lee; Kyung Yong Kim – Journal of Educational Measurement, 2025
When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g.,…
Descriptors: Test Items, Equated Scores, Test Bias, Accuracy
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Kelsey Nason; Christine DeMars – Journal of Educational Measurement, 2025
This study examined the widely used threshold of 0.2 for Yen's Q3, an index for violations of local independence. Specifically, a simulation was conducted to investigate whether Q3 values were related to the magnitude of bias in estimates of reliability, item parameters, and examinee ability. Results showed that Q3 values below the typical cut-off…
Descriptors: Item Response Theory, Statistical Bias, Test Reliability, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Daniel P. Jurich; Matthew J. Madison – Educational Assessment, 2023
Diagnostic classification models (DCMs) are psychometric models that provide probabilistic classifications of examinees on a set of discrete latent attributes. When analyzing or constructing assessments scored by DCMs, understanding how each item influences attribute classifications can clarify the meaning of the measured constructs, facilitate…
Descriptors: Test Items, Models, Classification, Influences
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Joshua B. Gilbert; Zachary Himmelsbach; James Soland; Mridul Joshi; Benjamin W. Domingue – Journal of Policy Analysis and Management, 2025
Analyses of heterogeneous treatment effects (HTE) are common in applied causal inference research. However, when outcomes are latent variables assessed via psychometric instruments such as educational tests, standard methods ignore the potential HTE that may exist among the individual items of the outcome measure. Failing to account for…
Descriptors: Item Response Theory, Test Items, Error of Measurement, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025
Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…
Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Leonidas Zotos; Hedderik van Rijn; Malvina Nissim – International Educational Data Mining Society, 2025
In an educational setting, an estimate of the difficulty of Multiple-Choice Questions (MCQs), a commonly used strategy to assess learning progress, constitutes very useful information for both teachers and students. Since human assessment is costly from multiple points of view, automatic approaches to MCQ item difficulty estimation are…
Descriptors: Multiple Choice Tests, Test Items, Difficulty Level, Artificial Intelligence
Peer reviewed Peer reviewed
Direct linkDirect link
Yanyan Fu – Educational Measurement: Issues and Practice, 2024
The template-based automated item-generation (TAIG) approach that involves template creation, item generation, item selection, field-testing, and evaluation has more steps than the traditional item development method. Consequentially, there is more margin for error in this process, and any template errors can be cascaded to the generated items.…
Descriptors: Error Correction, Automation, Test Items, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Stella Y. Kim; Sungyeun Kim – Educational Measurement: Issues and Practice, 2025
This study presents several multivariate Generalizability theory designs for analyzing automatic item-generated (AIG) based test forms. The study used real data to illustrate the analysis procedure and discuss practical considerations. We collected the data from two groups of students, each group receiving a different form generated by AIG. A…
Descriptors: Generalizability Theory, Automation, Test Items, Students
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Neset Demirci – Turkish Online Journal of Educational Technology - TOJET, 2025
In this study, the performance of artificial intelligence chatbots--OpenAI's ChatGPT, Google Gemini, and Microsoft's Copilot--was evaluated and compared based on their responses to questions from the Turkish Higher Education Entrance Physics Examination over the past three years. Analysis of the chatbots' responses to TYT Physics questions showed…
Descriptors: Artificial Intelligence, College Entrance Examinations, Physics, Science Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Changiz Mohiyeddini – Anatomical Sciences Education, 2025
This article presents a step-by-step guide to using R and SPSS to bootstrap exam questions. Bootstrapping, a versatile nonparametric analytical technique, can help to improve the psychometric qualities of exam questions in the process of quality assurance. Bootstrapping is particularly useful in disciplines such as medical education, where student…
Descriptors: Test Items, Sampling, Statistical Inference, Nonparametric Statistics
Peer reviewed Peer reviewed
Direct linkDirect link
Daibao Guo; Katherine Landau Wright; Lianne Josbacher; Eun Hye Son – Elementary School Journal, 2025
Limited research has explored the use of visual displays (ViDis) in science tests, making it challenging to know how these tests align with classroom instruction and what skills students need to be successful on these tests. Therefore, the current study aims to describe the use of ViDis in upper elementary grade standardized science tests. We…
Descriptors: Standardized Tests, Science Tests, Elementary Education, Science Education
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Yumou Wei; Paulo Carvalho; John Stamper – International Educational Data Mining Society, 2025
Educators evaluate student knowledge using knowledge component (KC) models that map assessment questions to KCs. Still, designing KC models for large question banks remains an insurmountable challenge for instructors who need to analyze each question by hand. The growing use of Generative AI in education is expected only to aggravate this chronic…
Descriptors: Artificial Intelligence, Cluster Grouping, Student Evaluation, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Marcoulides, Katerina M. – Measurement: Interdisciplinary Research and Perspectives, 2023
Integrative data analyses have recently been shown to be an effective tool for researchers interested in synthesizing datasets from multiple studies in order to draw statistical or substantive conclusions. The actual process of integrating the different datasets depends on the availability of some common measures or items reflecting the same…
Descriptors: Data Analysis, Synthesis, Test Items, Simulation
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  635