Publication Date
| In 2026 | 0 |
| Since 2025 | 190 |
| Since 2022 (last 5 years) | 1057 |
| Since 2017 (last 10 years) | 2567 |
| Since 2007 (last 20 years) | 4928 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 225 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 65 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Oluwaseyi Aina Gbolade Opesemowo – Research in Social Sciences and Technology, 2023
Local Item Dependence (LID) is a desecration of Local Item Independence (LII) which can lead to overestimating or underestimating a candidate's ability in mathematics items and create validity problems. The study investigated the intra and inter-LID of mathematics items. The study made use of ex-post facto research. The population encompassed all…
Descriptors: Foreign Countries, Secondary School Students, Item Response Theory, Test Items
Gorney, Kylie – ProQuest LLC, 2023
Aberrant behavior refers to any type of unusual behavior that would not be expected under normal circumstances. In educational and psychological testing, such behaviors have the potential to severely bias the aberrant examinee's test score while also jeopardizing the test scores of countless others. It is therefore crucial that aberrant examinees…
Descriptors: Behavior Problems, Educational Testing, Psychological Testing, Test Bias
Baryktabasov, Kasym; Jumabaeva, Chinara; Brimkulov, Ulan – Research in Learning Technology, 2023
Many examinations with thousands of participating students are organized worldwide every year. Usually, this large number of students sit the exams simultaneously and answer almost the same set of questions. This method of learning assessment requires tremendous effort and resources to prepare the venues, print question books and organize the…
Descriptors: Information Technology, Computer Assisted Testing, Test Items, Adaptive Testing
Laura Laclede – ProQuest LLC, 2023
Because non-cognitive constructs can influence student success in education beyond academic achievement, it is essential that they are reliably conceptualized and measured. Within this context, there are several gaps in the literature related to correctly interpreting the meaning of scale scores when a non-standard response option like I do not…
Descriptors: High School Students, Test Wiseness, Models, Test Items
Thompson, Kathryn N. – ProQuest LLC, 2023
It is imperative to collect validity evidence prior to interpreting and using test scores. During the process of collecting validity evidence, test developers should consider whether test scores are contaminated by sources of extraneous information. This is referred to as construct irrelevant variance, or the "degree to which test scores are…
Descriptors: Test Wiseness, Test Items, Item Response Theory, Scores
Balbuena, Sherwin – International Journal of Assessment Tools in Education, 2023
Depression is a latent characteristic that is measured through self-reported or clinician-mediated instruments such as scales and inventories. The precision of depression estimates largely depends on the validity of the items used and on the truthfulness of people responding to these items. The existing methodology in instrumentation based on a…
Descriptors: Depression (Psychology), Test Items, Test Validity, Test Reliability
DeMars, Christine E. – Applied Measurement in Education, 2021
Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex…
Descriptors: Item Response Theory, Test Items, Ability, Scores
Rossi, Olena; Brunfaut, Tineke – Language Assessment Quarterly, 2021
A long-standing debate in the testing of listening concerns the authenticity of the listening input. On the one hand, listening texts produced by item writers often lack spoken language characteristics. On the other hand, real-life recordings are often too context-specific to stand alone, or not suitable for item generation. In this study, we…
Descriptors: Listening Comprehension Tests, Test Items, Test Construction, Training
Thirakunkovit, Suthathip; Rhee, Seongha – THAITESOL Journal, 2021
This study explores the extent to which the difficulty levels of grammar items in an English test can be predicted by the complexity of grammatical structures. The researchers carried out two sets of analyses. In the first analysis, the item facility and item discrimination indices of 175 multiple-choice items were examined. In the second…
Descriptors: Grammar, Test Items, Difficulty Level, English (Second Language)
Haladyna, Thomas M.; Rodriguez, Michael C. – Educational Assessment, 2021
Full-information item analysis provides item developers and reviewers comprehensive empirical evidence of item quality, including option response frequency, point-biserial index (PBI) for distractors, mean-scores of respondents selecting each option, and option trace lines. The multi-serial index (MSI) is introduced as a more informative…
Descriptors: Test Items, Item Analysis, Reading Tests, Mathematics Tests
Raykov, Tenko; Marcoulides, George A.; Pusic, Martin – Measurement: Interdisciplinary Research and Perspectives, 2021
An interval estimation procedure is discussed that can be used to evaluate the probability of a particular response for a binary or binary scored item at a pre-specified point along an underlying latent continuum. The item is assumed to: (a) be part of a unidimensional multi-component measuring instrument that may contain also polytomous items,…
Descriptors: Item Response Theory, Computation, Probability, Test Items
Aybek, Eren Can – Journal of Applied Testing Technology, 2021
The study aims to introduce catIRT tools which facilitates researchers' Item Response Theory (IRT) and Computerized Adaptive Testing (CAT) simulations. catIRT tools provides an interface for mirt and catR packages through the shiny package in R. Through this interface, researchers can apply IRT calibration and CAT simulations although they do not…
Descriptors: Item Response Theory, Computer Assisted Testing, Simulation, Models
Xiao, Yue; He, Qiwei; Veldkamp, Bernard; Liu, Hongyun – Journal of Computer Assisted Learning, 2021
The response process of problem-solving items contains rich information about respondents' behaviours and cognitive process in the digital tasks, while the information extraction is a big challenge. The aim of the study is to use a data-driven approach to explore the latent states and state transitions underlying problem-solving process to reflect…
Descriptors: Problem Solving, Competence, Markov Processes, Test Wiseness
Tabuena, Almighty C.; Morales, Glinore S. – Online Submission, 2021
This study identified and annotated appropriate test items using the multiple-choice test item format in the cognitive domain of the taxonomy of educational objectives in assessing and evaluating musical learning through the descriptive-developmental research design. This assessment approach is one of the key skills needed of Music teachers to…
Descriptors: Multiple Choice Tests, Test Items, Cognitive Objectives, Taxonomy
Condor, Aubrey; Litster, Max; Pardos, Zachary – International Educational Data Mining Society, 2021
We explore how different components of an Automatic Short Answer Grading (ASAG) model affect the model's ability to generalize to questions outside of those used for training. For supervised automatic grading models, human ratings are primarily used as ground truth labels. Producing such ratings can be resource heavy, as subject matter experts…
Descriptors: Automation, Grading, Test Items, Generalization

Peer reviewed
Direct link
