Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 7 |
| Since 2017 (last 10 years) | 10 |
| Since 2007 (last 20 years) | 38 |
Descriptor
| Difficulty Level | 49 |
| Test Items | 44 |
| Item Response Theory | 41 |
| Comparative Analysis | 10 |
| Computation | 10 |
| Models | 10 |
| Test Bias | 9 |
| Monte Carlo Methods | 8 |
| Response Style (Tests) | 8 |
| Foreign Countries | 7 |
| Bayesian Statistics | 6 |
| More ▼ | |
Source
| Educational and Psychological… | 49 |
Author
| Ahn, Soyeon | 2 |
| Andrich, David | 2 |
| DeMars, Christine E. | 2 |
| Kubinger, Klaus D. | 2 |
| Strobl, Carolin | 2 |
| Ace, Merle C. | 1 |
| Al-Harbi, Khaleel | 1 |
| Bandalos, Deborah L. | 1 |
| Cai, Li | 1 |
| Córdova, Nora | 1 |
| Dardick, William R. | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 46 |
| Reports - Research | 38 |
| Reports - Evaluative | 7 |
| Information Analyses | 1 |
| Reports - Descriptive | 1 |
| Speeches/Meeting Papers | 1 |
Education Level
| Higher Education | 5 |
| Postsecondary Education | 5 |
| Secondary Education | 4 |
| Elementary Education | 3 |
| Grade 3 | 2 |
| Early Childhood Education | 1 |
| Elementary Secondary Education | 1 |
| Grade 4 | 1 |
| Grade 5 | 1 |
| Grade 6 | 1 |
| Grade 7 | 1 |
| More ▼ | |
Audience
Location
| Australia | 2 |
| California | 1 |
| Chile | 1 |
| Florida | 1 |
| Germany | 1 |
| Greece | 1 |
| Japan | 1 |
| Saudi Arabia | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Raven Progressive Matrices | 2 |
| SAT (College Admission Test) | 2 |
| Advanced Placement… | 1 |
| Childrens Manifest Anxiety… | 1 |
| General Aptitude Test Battery | 1 |
| Program for International… | 1 |
What Works Clearinghouse Rating
Kuan-Yu Jin; Thomas Eckes – Educational and Psychological Measurement, 2024
Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable…
Descriptors: Item Response Theory, Test Items, Test Wiseness, Surveys
Kam, Chester Chun Seng – Educational and Psychological Measurement, 2023
When constructing measurement scales, regular and reversed items are often used (e.g., "I am satisfied with my job"/"I am not satisfied with my job"). Some methodologists recommend excluding reversed items because they are more difficult to understand and therefore engender a second, artificial factor distinct from the…
Descriptors: Test Items, Difficulty Level, Test Construction, Construct Validity
Xue, Kang; Huggins-Manley, Anne Corinne; Leite, Walter – Educational and Psychological Measurement, 2022
In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rely on unbiased item parameter estimates associated with test items in the VLE. Without formal piloting of the items, one can expect a large amount of…
Descriptors: Virtual Classrooms, Artificial Intelligence, Item Response Theory, Item Analysis
Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…
Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory
Spratto, Elisabeth M.; Leventhal, Brian C.; Bandalos, Deborah L. – Educational and Psychological Measurement, 2021
In this study, we examined the results and interpretations produced from two different IRTree models--one using paths consisting of only dichotomous decisions, and one using paths consisting of both dichotomous and polytomous decisions. We used data from two versions of an impulsivity measure. In the first version, all the response options had…
Descriptors: Comparative Analysis, Item Response Theory, Decision Making, Data Analysis
Fellinghauer, Carolina; Debelak, Rudolf; Strobl, Carolin – Educational and Psychological Measurement, 2023
This simulation study investigated to what extent departures from construct similarity as well as differences in the difficulty and targeting of scales impact the score transformation when scales are equated by means of concurrent calibration using the partial credit model with a common person design. Practical implications of the simulation…
Descriptors: True Scores, Equated Scores, Test Items, Sample Size
Sideridis, Georgios; Tsaousis, Ioannis; Al-Harbi, Khaleel – Educational and Psychological Measurement, 2022
The goal of the present study was to address the analytical complexity of incorporating responses and response times through applying the Jeon and De Boeck mixture item response theory model in Mplus 8.7. Using both simulated and real data, we attempt to identify subgroups of responders that are rapid guessers or engage knowledge retrieval…
Descriptors: Reaction Time, Guessing (Tests), Item Response Theory, Information Retrieval
Lenhard, Wolfgang; Lenhard, Alexandra – Educational and Psychological Measurement, 2021
The interpretation of psychometric test results is usually based on norm scores. We compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for test scales with different item numbers and difficulties via an item response theory approach. Subsequently, we modeled the norm scores based on random…
Descriptors: Test Norms, Scores, Regression (Statistics), Test Items
Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021
This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…
Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods
Lions, Séverin; Dartnell, Pablo; Toledo, Gabriela; Godoy, María Inés; Córdova, Nora; Jiménez, Daniela; Lemarié, Julie – Educational and Psychological Measurement, 2023
Even though the impact of the position of response options on answers to multiple-choice items has been investigated for decades, it remains debated. Research on this topic is inconclusive, perhaps because too few studies have obtained experimental data from large-sized samples in a real-world context and have manipulated the position of both…
Descriptors: Multiple Choice Tests, Test Items, Item Analysis, Responses
Matlock, Ki Lynn; Turner, Ronna – Educational and Psychological Measurement, 2016
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Descriptors: Item Response Theory, Computation, Test Items, Difficulty Level
Wright, Keith D.; Oshima, T. C. – Educational and Psychological Measurement, 2015
This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Descriptors: Effect Size, Test Bias, Test Items, Difficulty Level
Dimitrov, Dimiter M. – Educational and Psychological Measurement, 2016
This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…
Descriptors: Scoring, Equated Scores, Test Items, Measurement
Okumura, Taichi – Educational and Psychological Measurement, 2014
This study examined the empirical differences between the tendency to omit items and reading ability by applying tree-based item response (IRTree) models to the Japanese data of the Programme for International Student Assessment (PISA) held in 2009. For this purpose, existing IRTree models were expanded to contain predictors and to handle…
Descriptors: Foreign Countries, Item Response Theory, Test Items, Reading Ability
DeMars, Christine E.; Jurich, Daniel P. – Educational and Psychological Measurement, 2015
In educational testing, differential item functioning (DIF) statistics must be accurately estimated to ensure the appropriate items are flagged for inspection or removal. This study showed how using the Rasch model to estimate DIF may introduce considerable bias in the results when there are large group differences in ability (impact) and the data…
Descriptors: Test Bias, Guessing (Tests), Ability, Differences

Peer reviewed
Direct link
