ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	9
Since 2016 (last 10 years)	34
Since 2006 (last 20 years)	51

Source

Applied Measurement in…

Publication Type

Journal Articles	64
Reports - Research	57
Reports - Evaluative	7
Tests/Questionnaires	4
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Secondary Education	17
Elementary Secondary Education	12
Elementary Education	11
Higher Education	8
Postsecondary Education	6
Junior High Schools	5
Middle Schools	5
Grade 3	4
Grade 8	4
Grade 4	3
High Schools	3
Intermediate Grades	3
Grade 6	2
Early Childhood Education	1
Grade 1	1
Grade 11	1
Grade 12	1
Grade 2	1
Grade 5	1
Grade 7	1
Grade 9	1
Primary Education	1
More ▼

Audience

Location

Canada	14
Netherlands	6
Australia	5
Germany	5
Israel	5
United States	4
United Kingdom	3
Belgium	2
Europe	2
Finland	2
Iran	2
Iran (Tehran)	2
Singapore	2
Spain	2
Costa Rica	1
France	1
Italy	1
Japan	1
Jordan	1
Norway	1
Oman	1
Romania	1
Russia	1
Slovenia	1
South Korea	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	15
Trends in International…	7
Graduate Record Examinations	1
National Assessment of…	1
Perceived Competence Scale…	1
Progress in International…	1
Test Anxiety Inventory	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 64 results Save | Export

Cross-Cultural Validation of the Mathematics Construct and Attribute Profiles: A Differential Item Functioning Approach

Peer reviewed

Direct link

Yi-Hsin Chen – Applied Measurement in Education, 2024

This study aims to apply the differential item functioning (DIF) technique with the deterministic inputs, noisy "and" gate (DINA) model to validate the mathematics construct and diagnostic attribute profiles across American and Singaporean students. Even with the same ability level, every single item is expected to show uniform DIF…

Descriptors: Foreign Countries, Achievement Tests, Elementary Secondary Education, International Assessment

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Coefficient [beta] as Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-Scored Tests

Peer reviewed

Direct link

Almehrizi, Rashid S. – Applied Measurement in Education, 2021

KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…

Descriptors: Test Reliability, Scores, Scoring, Computation

Exploring Interrelationships among L2 Writing Subskills: Insights from Cognitive Diagnostic Models

Peer reviewed

Direct link

Hamdollah Ravand; Farshad Effatpanah; Wenchao Ma; Jimmy de la Torre; Purya Baghaei; Olga Kunina-Habenicht – Applied Measurement in Education, 2024

The purpose of this study was to explore the nature of interactions among second/foreign language (L2) writing subskills. Two types of relationships were investigated: subskill-item and subskill-subskill relationships. To achieve the first purpose, using writing data obtained from the writing essays of 500 English as a foreign language (EFL)…

Descriptors: Second Language Learning, Writing Instruction, Writing Skills, Writing Tests

Not-Reached Items: An Issue of Time and of Test-Taking Disengagement? The Case of PISA 2015 Reading Data

Peer reviewed

Direct link

Pools, Elodie – Applied Measurement in Education, 2022

Many low-stakes assessments, such as international large-scale surveys, are administered during time-limited testing sessions and some test-takers are not able to endorse the last items of the test, resulting in not-reached (NR) items. However, because the test has no consequence for the respondents, these NR items can also stem from quitting the…

Descriptors: Achievement Tests, Foreign Countries, International Assessment, Secondary School Students

Dissecting Knowledge, Guessing, and Blunder in Multiple Choice Assessments

Peer reviewed

Direct link

Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023

Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…

Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models

Detecting Differential Item Functioning Using Cognitive Diagnosis Models: Applications of the Wald Test and Likelihood Ratio Test in a University Entrance Examination

Peer reviewed

Direct link

Mehrazmay, Roghayeh; Ghonsooly, Behzad; de la Torre, Jimmy – Applied Measurement in Education, 2021

The present study aims to examine gender differential item functioning (DIF) in the reading comprehension section of a high stakes test using cognitive diagnosis models. Based on the multiple-group generalized deterministic, noisy "and" gate (MG G-DINA) model, the Wald test and likelihood ratio test are used to detect DIF. The flagged…

Descriptors: Test Bias, College Entrance Examinations, Gender Differences, Reading Tests

Measurement Invariance in Relation to First Language: An Evaluation of German Reading and Spelling Tests

Peer reviewed

Direct link

Visser, Linda; Cartschau, Friederike; von Goldammer, Ariane; Brandenburg, Janin; Timmerman, Marieke; Hasselhorn, Marcus; Mähler, Claudia – Applied Measurement in Education, 2023

The growing number of children in primary schools in Germany who have German as their second language (L2) has raised questions about the fairness of performance assessment. Fair tests are a prerequisite for distinguishing between L2 learning delay and a specific learning disability. We evaluated five commonly used reading and spelling tests for…

Descriptors: Foreign Countries, Error of Measurement, Second Language Learning, German

Can Culture Be a Salient Predictor of Test-Taking Engagement? An Analysis of Differential Noneffortful Responding on an International College-Level Assessment of Critical Thinking

Peer reviewed

Direct link

Rios, Joseph A.; Guo, Hongwen – Applied Measurement in Education, 2020

The objective of this study was to evaluate whether differential noneffortful responding (identified via response latencies) was present in four countries administered a low-stakes college-level critical thinking assessment. Results indicated significant differences (as large as 0.90 "SD") between nearly all country pairings in the…

Descriptors: Response Style (Tests), Cultural Differences, Critical Thinking, Cognitive Tests

Computer-Based Listening Test with Full Video, Visual-Limited Video, and Audio: A Comparative Analysis Based on Difficulty, Discrimination Power, and Response Time

Peer reviewed

Direct link

Takahiro Terao – Applied Measurement in Education, 2024

This study aimed to compare item characteristics and response time between stimulus conditions in computer-delivered listening tests. Listening materials had three variants: regular videos, frame-by-frame videos, and only audios without visuals. Participants were 228 Japanese high school students who were requested to complete one of nine…

Descriptors: Computer Assisted Testing, Audiovisual Aids, Reaction Time, High School Students

The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments

Peer reviewed

Direct link

El Masri, Yasmine H.; Andrich, David – Applied Measurement in Education, 2020

In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item…

Descriptors: Models, Goodness of Fit, Test Validity, Achievement Tests

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

Peer reviewed

Direct link

Sachse, Karoline A.; Haag, Nicole – Applied Measurement in Education, 2017

Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…

Descriptors: Error of Measurement, Test Bias, International Assessment, Computation

Establishing a Crosswalk between the Common European Framework for Languages (CEFR) and Writing Domains Scored by Automated Essay Scoring

Peer reviewed

Direct link

Shermis, Mark D. – Applied Measurement in Education, 2018

This article employs the Common European Framework Reference for Language Acquisition (CEFR) as a basis for evaluating writing in the context of machine scoring. The CEFR was designed as a framework for evaluating proficiency levels of speaking for the 49 languages comprising the European Union. The intent was to impact language instruction so…

Descriptors: Scoring, Automation, Essays, Language Proficiency

Foundations of Formative Assessment: Introducing a Learning Progression to Guide Preservice Physics Teachers' Video-Based Interpretation of Student Thinking

Peer reviewed

Direct link

von Aufschnaiter, Claudia; Alonzo, Alicia C. – Applied Measurement in Education, 2018

Establishing nuanced interpretations of student thinking is central to formative assessment but difficult, especially for preservice teachers. Learning progressions (LPs) have been proposed as a framework for promoting interpretations of students' thinking; however, research is needed to investigate whether and how an LP can be used to support…

Descriptors: Formative Evaluation, Preservice Teachers, Physics, Science Instruction

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Ercikan, Kadriye	5
Andrich, David	2
Byrne, Barbara M.	2
Gierl, Mark J.	2
Hambleton, Ronald K.	2
Heldsinger, Sandra	2
Hickendorff, Marian	2
Janssen, Rianne	2
Lyons-Thomas, Juliette	2
Rios, Joseph A.	2
Rogers, W. Todd	2
Sireci, Stephen G.	2
Abu-Ghazalah, Rashid M.	1
Abulela, Mohammed A. A.	1
Ainley, John	1
Allalouf, Avi	1
Almehrizi, Rashid S.	1
Alonzo, Alicia C.	1
Alves, Cecilia B.	1
Andersen, Øistein E.	1
Attali, Yigal	1
Azen, Razia	1
Bahry, Louise M.	1
Bateson, David J.	1
Bazargan, Abbas	1
More ▼

Foreign Countries	64
Test Items	24
International Assessment	17
Achievement Tests	14
Mathematics Tests	14
Test Bias	13
Comparative Analysis	12
Secondary School Students	11
Scores	10
Computer Assisted Testing	8
Correlation	8
Item Analysis	8
Item Response Theory	8
Reading Tests	8
Test Construction	8
Difficulty Level	7
Elementary School Students	7
High School Students	7
Models	7
Test Validity	7
Computation	6
High Stakes Tests	6
Mathematics Achievement	6
Psychometrics	6
Scoring	6
More ▼