Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 34 |
| Since 2017 (last 10 years) | 801 |
| Since 2007 (last 20 years) | 2609 |
Descriptor
| Statistical Analysis | 2935 |
| Feedback (Response) | 1143 |
| Foreign Countries | 1107 |
| Item Response Theory | 663 |
| Questionnaires | 604 |
| Student Attitudes | 510 |
| Comparative Analysis | 498 |
| Correlation | 476 |
| Emotional Response | 458 |
| Teaching Methods | 446 |
| College Students | 375 |
| More ▼ | |
Source
Author
| Sinharay, Sandip | 17 |
| Smolkowski, Keith | 13 |
| Fien, Hank | 12 |
| Clarke, Ben | 11 |
| Doabler, Christian T. | 11 |
| Alonzo, Julie | 10 |
| Tindal, Gerald | 10 |
| Baker, Scott K. | 9 |
| Haberman, Shelby J. | 9 |
| Cho, Sun-Joo | 7 |
| Lai, Cheng-Fei | 7 |
| More ▼ | |
Publication Type
Education Level
Location
| Australia | 97 |
| Turkey | 71 |
| Germany | 65 |
| Taiwan | 57 |
| Canada | 55 |
| Iran | 54 |
| United Kingdom | 54 |
| China | 51 |
| Netherlands | 46 |
| California | 33 |
| Japan | 33 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 7 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Rios, Joseph A. – Educational and Psychological Measurement, 2021
Low test-taking effort as a validity threat is common when examinees perceive an assessment context to have minimal personal value. Prior research has shown that in such contexts, subgroups may differ in their effort, which raises two concerns when making subgroup mean comparisons. First, it is unclear how differential effort could influence…
Descriptors: Response Style (Tests), Statistical Analysis, Measurement, Comparative Analysis
Wind, Stefanie A. – Language Testing, 2019
Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…
Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests
Joshua B. Gilbert – Annenberg Institute for School Reform at Brown University, 2022
This simulation study examines the characteristics of the Explanatory Item Response Model (EIRM) when estimating treatment effects when compared to classical test theory (CTT) sum and mean scores and item response theory (IRT)-based theta scores. Results show that the EIRM and IRT theta scores provide generally equivalent bias and false positive…
Descriptors: Item Response Theory, Models, Test Theory, Computation
Sauder, Derek; DeMars, Christine – Applied Measurement in Education, 2020
We used simulation techniques to assess the item-level and familywise Type I error control and power of an IRT item-fit statistic, the "S-X"[superscript 2]. Previous research indicated that the "S-X"[superscript 2] has good Type I error control and decent power, but no previous research examined familywise Type I error control.…
Descriptors: Item Response Theory, Test Items, Sample Size, Test Length
Xiao, Jiaying; Bulut, Okan – Educational and Psychological Measurement, 2020
Large amounts of missing data could distort item parameter estimation and lead to biased ability estimates in educational assessments. Therefore, missing responses should be handled properly before estimating any parameters. In this study, two Monte Carlo simulation studies were conducted to compare the performance of four methods in handling…
Descriptors: Data, Computation, Ability, Maximum Likelihood Statistics
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2020
A major challenge in the widespread application of Mokken scale analysis (MSA) to educational performance assessments is the requirement of complete data, where every rater rates every student. In this study, simulated and real data are used to demonstrate a method by which researchers and practitioners can apply MSA to incomplete rating designs.…
Descriptors: Item Response Theory, Scaling, Nonparametric Statistics, Performance Based Assessment
Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020
The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…
Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level
Cho, April E.; Wang, Chun; Zhang, Xue; Xu, Gongjun – Grantee Submission, 2020
Multidimensional Item Response Theory (MIRT) is widely used in assessment and evaluation of educational and psychological tests. It models the individual response patterns by specifying functional relationship between individuals' multiple latent traits and their responses to test items. One major challenge in parameter estimation in MIRT is that…
Descriptors: Item Response Theory, Mathematics, Statistical Inference, Maximum Likelihood Statistics
Soysal, Sumeyra; Yilmaz Kogar, Esin – International Journal of Assessment Tools in Education, 2021
In this study, whether item position effects lead to DIF in the condition where different test booklets are used was investigated. To do this the methods of Lord's chi-square and Raju's unsigned area with the 3PL model under with and without item purification were used. When the performance of the methods was compared, it was revealed that…
Descriptors: Item Response Theory, Test Bias, Test Items, Comparative Analysis
Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017
When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…
Descriptors: Item Response Theory, Test Items, Responses, Testing Problems
Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2019
The Mantel-Haenszel delta difference (MH D-DIF) and the standardized proportion difference (STD P-DIF) are two observed-score methods that have been used to assess differential item functioning (DIF) at Educational Testing Service since the early 1990s. Latentvariable approaches to assessing measurement invariance at the item level have been…
Descriptors: Test Bias, Educational Testing, Statistical Analysis, Item Response Theory
Zhang, Zhonghua – Applied Measurement in Education, 2020
The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the…
Descriptors: Error of Measurement, Computation, Equated Scores, True Scores
Zehner, Fabian; Harrison, Scott; Eichmann, Beate; Deribo, Tobias; Bengs, Daniel; Andersen, Nico; Hahnel, Carolin – International Educational Data Mining Society, 2020
The "2nd Annual WPI-UMASS-UPENN EDM Data Mining Challenge" required contestants to predict efficient testtaking based on log data. In this paper, we describe our theory-driven and psychometric modeling approach. For feature engineering, we employed the Log-Normal Response Time Model for estimating latent person speed, and the Generalized…
Descriptors: Data Analysis, Competition, Classification, Prediction
Mousavi, Amin; Schmidt, Matthew; Squires, Vicki; Wilson, Ken – International Journal of Artificial Intelligence in Education, 2021
Greer and Mark's (2016) paper suggested and reviewed different methods for evaluating the effectiveness of intelligent tutoring systems such as Propensity score matching. The current study aimed at assessing the effectiveness of automated personalized feedback intervention implemented via the Student Advice Recommender Agent (SARA) in a first-year…
Descriptors: Automation, Feedback (Response), Intervention, College Freshmen
Zheng, Xiaying; Yang, Ji Seung – Measurement: Interdisciplinary Research and Perspectives, 2021
The purpose of this paper is to briefly introduce two most common applications of multiple group item response theory (IRT) models, namely detecting differential item functioning (DIF) analysis and nonequivalent group score linking with a simultaneous calibration. We illustrate how to conduct those analyses using the "Stata" item…
Descriptors: Item Response Theory, Test Bias, Computer Software, Statistical Analysis

Peer reviewed
Direct link
