Publication Date
In 2025 | 0 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 31 |
Since 2006 (last 20 years) | 46 |
Descriptor
Source
Applied Measurement in… | 57 |
Author
Publication Type
Journal Articles | 57 |
Reports - Research | 57 |
Tests/Questionnaires | 4 |
Information Analyses | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Secondary Education | 16 |
Elementary Education | 11 |
Elementary Secondary Education | 11 |
Higher Education | 7 |
Postsecondary Education | 6 |
Grade 3 | 4 |
Grade 8 | 4 |
Junior High Schools | 4 |
Middle Schools | 4 |
Grade 4 | 3 |
High Schools | 3 |
More ▼ |
Audience
Location
Canada | 14 |
Netherlands | 6 |
Australia | 5 |
Germany | 4 |
Israel | 4 |
United States | 3 |
Belgium | 2 |
Finland | 2 |
Iran | 2 |
Iran (Tehran) | 2 |
Singapore | 2 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 15 |
Trends in International… | 7 |
National Assessment of… | 1 |
Perceived Competence Scale… | 1 |
Progress in International… | 1 |
Test Anxiety Inventory | 1 |
What Works Clearinghouse Rating
Yi-Hsin Chen – Applied Measurement in Education, 2024
This study aims to apply the differential item functioning (DIF) technique with the deterministic inputs, noisy "and" gate (DINA) model to validate the mathematics construct and diagnostic attribute profiles across American and Singaporean students. Even with the same ability level, every single item is expected to show uniform DIF…
Descriptors: Foreign Countries, Achievement Tests, Elementary Secondary Education, International Assessment
Almehrizi, Rashid S. – Applied Measurement in Education, 2021
KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…
Descriptors: Test Reliability, Scores, Scoring, Computation
Hamdollah Ravand; Farshad Effatpanah; Wenchao Ma; Jimmy de la Torre; Purya Baghaei; Olga Kunina-Habenicht – Applied Measurement in Education, 2024
The purpose of this study was to explore the nature of interactions among second/foreign language (L2) writing subskills. Two types of relationships were investigated: subskill-item and subskill-subskill relationships. To achieve the first purpose, using writing data obtained from the writing essays of 500 English as a foreign language (EFL)…
Descriptors: Second Language Learning, Writing Instruction, Writing Skills, Writing Tests
Pools, Elodie – Applied Measurement in Education, 2022
Many low-stakes assessments, such as international large-scale surveys, are administered during time-limited testing sessions and some test-takers are not able to endorse the last items of the test, resulting in not-reached (NR) items. However, because the test has no consequence for the respondents, these NR items can also stem from quitting the…
Descriptors: Achievement Tests, Foreign Countries, International Assessment, Secondary School Students
Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023
Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…
Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models
Mehrazmay, Roghayeh; Ghonsooly, Behzad; de la Torre, Jimmy – Applied Measurement in Education, 2021
The present study aims to examine gender differential item functioning (DIF) in the reading comprehension section of a high stakes test using cognitive diagnosis models. Based on the multiple-group generalized deterministic, noisy "and" gate (MG G-DINA) model, the Wald test and likelihood ratio test are used to detect DIF. The flagged…
Descriptors: Test Bias, College Entrance Examinations, Gender Differences, Reading Tests
Rios, Joseph A.; Guo, Hongwen – Applied Measurement in Education, 2020
The objective of this study was to evaluate whether differential noneffortful responding (identified via response latencies) was present in four countries administered a low-stakes college-level critical thinking assessment. Results indicated significant differences (as large as 0.90 "SD") between nearly all country pairings in the…
Descriptors: Response Style (Tests), Cultural Differences, Critical Thinking, Cognitive Tests
Takahiro Terao – Applied Measurement in Education, 2024
This study aimed to compare item characteristics and response time between stimulus conditions in computer-delivered listening tests. Listening materials had three variants: regular videos, frame-by-frame videos, and only audios without visuals. Participants were 228 Japanese high school students who were requested to complete one of nine…
Descriptors: Computer Assisted Testing, Audiovisual Aids, Reaction Time, High School Students
El Masri, Yasmine H.; Andrich, David – Applied Measurement in Education, 2020
In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item…
Descriptors: Models, Goodness of Fit, Test Validity, Achievement Tests
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Sachse, Karoline A.; Haag, Nicole – Applied Measurement in Education, 2017
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…
Descriptors: Error of Measurement, Test Bias, International Assessment, Computation
von Aufschnaiter, Claudia; Alonzo, Alicia C. – Applied Measurement in Education, 2018
Establishing nuanced interpretations of student thinking is central to formative assessment but difficult, especially for preservice teachers. Learning progressions (LPs) have been proposed as a framework for promoting interpretations of students' thinking; however, research is needed to investigate whether and how an LP can be used to support…
Descriptors: Formative Evaluation, Preservice Teachers, Physics, Science Instruction
Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017
In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Oliveri, Maria Elena; Ercikan, Kadriye; Lyons-Thomas, Juliette; Holtzman, Steven – Applied Measurement in Education, 2016
Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that…
Descriptors: Test Bias, Language Minorities, Effect Size, Foreign Countries