Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 5 |
| Since 2007 (last 20 years) | 8 |
Descriptor
| Difficulty Level | 16 |
| Error of Measurement | 16 |
| Item Analysis | 16 |
| Test Items | 13 |
| Comparative Analysis | 6 |
| Test Construction | 6 |
| Item Response Theory | 5 |
| Achievement Tests | 4 |
| Foreign Countries | 4 |
| Latent Trait Theory | 4 |
| Mathematical Models | 4 |
| More ▼ | |
Source
| Educational and Psychological… | 2 |
| Journal of Educational… | 2 |
| Applied Measurement in… | 1 |
| ETS Research Report Series | 1 |
| Evaluation and the Health… | 1 |
| Language Testing | 1 |
| Research Papers in Education | 1 |
Author
| Abulela, Mohammed A. A. | 1 |
| Ahn, Soyeon | 1 |
| Anwyll, Steve | 1 |
| Benson, Jeri | 1 |
| Clauser, Brian E. | 1 |
| Clauser, Jerome C. | 1 |
| Colton, Dean A. | 1 |
| Curry, Allen R. | 1 |
| Córdova, Nora | 1 |
| Dartnell, Pablo | 1 |
| Diamond, James J. | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 13 |
| Journal Articles | 9 |
| Speeches/Meeting Papers | 3 |
| Reports - Evaluative | 2 |
| Information Analyses | 1 |
| Reports - Descriptive | 1 |
| Tests/Questionnaires | 1 |
Audience
| Researchers | 1 |
Location
| Chile | 1 |
| Japan | 1 |
| United Kingdom (England) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| ACT Assessment | 1 |
| Medical College Admission Test | 1 |
| Program for International… | 1 |
What Works Clearinghouse Rating
Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021
Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…
Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis
Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020
An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…
Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting
Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021
This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…
Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods
Lions, Séverin; Dartnell, Pablo; Toledo, Gabriela; Godoy, María Inés; Córdova, Nora; Jiménez, Daniela; Lemarié, Julie – Educational and Psychological Measurement, 2023
Even though the impact of the position of response options on answers to multiple-choice items has been investigated for decades, it remains debated. Research on this topic is inconclusive, perhaps because too few studies have obtained experimental data from large-sized samples in a real-world context and have manipulated the position of both…
Descriptors: Multiple Choice Tests, Test Items, Item Analysis, Responses
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Zhang, Jinming; Li, Jie – Journal of Educational Measurement, 2016
An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed…
Descriptors: Computer Assisted Testing, Test Items, Difficulty Level, Item Response Theory
Suzuki, Yuichi – Language Testing, 2015
Self-assessment has been used to assess second language proficiency; however, as sources of measurement errors vary, they may threaten the validity and reliability of the tools. The present paper investigated the role of experiences in using Japanese as a second language in the naturalistic acquisition context on the accuracy of the…
Descriptors: Self Evaluation (Individuals), Error of Measurement, Japanese, Second Language Learning
He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014
Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…
Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis
Peer reviewedDiamond, James J.; McCormick, Janet – Evaluation and the Health Professions, 1986
Using item responses from an in-training examination in diagnostic radiology, the application of a strength of association statistic to the general problem of item analysis is illustrated. Criteria for item selection, general issues of reliability, and error of measurement are discussed. (Author/LMO)
Descriptors: Achievement Tests, Difficulty Level, Error of Measurement, Graduate Medical Education
Colton, Dean A. – 1993
Tables of specifications are used to guide test developers in sampling items and maintaining consistency from form to form. This paper is a generalizability study of the American College Testing Program (ACT) Achievement Program Mathematics Test (AAP), with the content areas of the table of specifications representing multiple dependent variables.…
Descriptors: Achievement Tests, Difficulty Level, Error of Measurement, Generalizability Theory
Wise, Lauress L. – 1986
A primary goal of this study was to determine the extent to which item difficulty was related to item position and, if a significant relationship was found, to suggest adjustments to predicted item difficulty that reflect differences in item position. Item response data from the Medical College Admission Test (MCAT) were analyzed. A data set was…
Descriptors: College Entrance Examinations, Difficulty Level, Educational Research, Error of Measurement
Curry, Allen R.; And Others – 1978
The efficacy of employing subsets of items from a calibrated item pool to estimate the Rasch model person parameters was investigated. Specifically, the degree of invariance of Rasch model ability-parameter estimates was examined across differing collections of simulated items. The ability-parameter estimates were obtained from a simulation of…
Descriptors: Career Development, Difficulty Level, Equated Scores, Error of Measurement
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.
Benson, Jeri; Wilson, Michael – 1979
Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Descriptors: Comparative Analysis, Difficulty Level, Efficiency, Error of Measurement
Ridgeway, Gretchen Freiheit – 1982
A one-parameter latent trait model was the basis of the test development procedures in the Basic Skills Assessment Program (BSAP) of the Department of Defense Dependents Schools (DoDDS). Several issues are involved in applying the Rasch model to an assessment program in a large school district. Separate sets of skills continua are arranged by…
Descriptors: Achievement Tests, Basic Skills, Dependents Schools, Difficulty Level
Roid, Gale; Haladyna, Tom – 1978
The technology of transforming sentences from prose instruction into test questions was examined by comparing two methods of selecting sentences (keyword vs. rare singleton), two types of question words (nouns vs. adjectives), and two foil construction methods (writer's choice vs. algorithmic). Four item writers created items using each…
Descriptors: Algorithms, Cloze Procedure, Comparative Analysis, Criterion Referenced Tests
Previous Page | Next Page »
Pages: 1 | 2
Direct link
