Publication Date
| In 2026 | 0 |
| Since 2025 | 4 |
| Since 2022 (last 5 years) | 46 |
| Since 2017 (last 10 years) | 98 |
| Since 2007 (last 20 years) | 166 |
Descriptor
| Comparative Analysis | 182 |
| Foreign Countries | 182 |
| Item Analysis | 182 |
| Test Items | 84 |
| Scores | 40 |
| Student Attitudes | 40 |
| Correlation | 34 |
| English (Second Language) | 33 |
| Factor Analysis | 33 |
| Questionnaires | 32 |
| Second Language Learning | 32 |
| More ▼ | |
Source
Author
| Ayan, Cansu | 2 |
| Baghaei, Purya | 2 |
| Donovan, Jenny | 2 |
| Ercikan, Kadriye | 2 |
| Gentry, Marcia | 2 |
| Ghonsooly, Behzad | 2 |
| Goldhammer, Frank | 2 |
| Hamlin, Robert G. | 2 |
| Hung Tan Ha | 2 |
| Hutton, Penny | 2 |
| Kalender, Ilker | 2 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 1 |
| Teachers | 1 |
Location
| Australia | 13 |
| China | 13 |
| Germany | 13 |
| Turkey | 12 |
| Canada | 8 |
| United Kingdom | 8 |
| United States | 8 |
| Indonesia | 7 |
| Iran | 7 |
| Japan | 7 |
| United Kingdom (England) | 7 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Ute Knoch; Jason Fan – Language Testing, 2024
While several test concordance tables have been published, the research underpinning such tables has rarely been examined in detail. This study aimed to survey the publically available studies or documentation underpinning the test concordance tables of the providers of four major international language tests, all accepted by the Australian…
Descriptors: Language Tests, English, Test Validity, Item Analysis
Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025
This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…
Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis
Alexandru Cernat; Joseph Sakshaug; Pablo Christmann; Tobias Gummer – Sociological Methods & Research, 2024
Mixed-mode surveys are popular as they can save costs and maintain (or improve) response rates relative to single-mode surveys. Nevertheless, it is not yet clear how design decisions like survey mode or questionnaire length impact measurement quality. In this study, we compare measurement quality in an experiment of three distinct survey designs…
Descriptors: Surveys, Questionnaires, Item Analysis, Attitude Measures
Carmen Batanero; Luis A. Hernandez-Solis; Maria M. Gea – Statistics Education Research Journal, 2023
We present an exploratory study of Costa Rican and Spanish students' (11-16-year-olds) competence to compare probabilities in urns and compare ratios in mixture problems. A sample of 704 students in Grades 6 through to Grade 10, 292 from Costa Rica and 412 from Spain, were given one of two forms of a questionnaire with three probability comparison…
Descriptors: Statistics Education, Comparative Analysis, Foreign Countries, Probability
Harrison, Scott; Kroehne, Ulf; Goldhammer, Frank; Lüdtke, Oliver; Robitzsch, Alexander – Large-scale Assessments in Education, 2023
Background: Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an…
Descriptors: Scoring, Test Items, Difficulty Level, Foreign Countries
Schaper, Marie Luisa; Kuhlmann, Beatrice G.; Bayen, Ute J. – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2023
Item memory and source memory are different aspects of episodic remembering. To investigate metamemory differences between them, the authors assessed systematic differences between predictions of item memory via Judgments of Learning (JOLs) and source memory via Judgments of Source (JOSs). Schema-based expectations affect JOLs and JOSs…
Descriptors: Memory, Metacognition, Schemata (Cognition), Prediction
Schröder, Jette; Schmiedeberg, Claudia – Sociological Methods & Research, 2023
Despite the fact that third parties are present during a substantial amount of face-to-face interviews, bystander influence on respondents' response behavior is not yet fully understood. We use nine waves of the German Family Panel "pairfam" and apply fixed effects panel regression models to analyze effects of third-party presence on…
Descriptors: Housework, Item Analysis, Interpersonal Relationship, Responses
Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025
The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…
Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction
Jiayi Deng – Large-scale Assessments in Education, 2025
Background: Test score comparability in international large-scale assessments (LSAs) is greatly important to ensure test fairness. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic version of test forms into a common score scale. An example is the multigroup…
Descriptors: Guessing (Tests), Item Response Theory, Error Patterns, Arabic
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction
Tim Stoeckel; Liang Ye Tan; Hung Tan Ha; Nam Thi Phuong Ho; Tomoko Ishii; Young Ae Kim; Chunmei Huang; Stuart McLean – Vocabulary Learning and Instruction, 2024
Local item dependency (LID) occurs when test-takers' responses to one test item are affected by their responses to another. It can be problematic if it causes inflated reliability estimates or distorted person and item measures. The cued-recall reading comprehension test in Hu and Nation's (2000) well-known and influential coverage--comprehension…
Descriptors: Reading Comprehension, English (Second Language), Second Language Instruction, Second Language Learning
Hayat, Bahrul – Cogent Education, 2022
The purpose of this study comprises (1) calibrating the Basic Statistics Test for Indonesian undergraduate psychology students using the Rasch model, (2) testing the impact of adjustment for guessing on item parameters, person parameters, test reliability, and distribution of item difficulty and person ability, and (3) comparing person scores…
Descriptors: Guessing (Tests), Statistics Education, Undergraduate Students, Psychology
Katrin Klingbeil; Fabian Rösken; Bärbel Barzel; Florian Schacht; Kaye Stacey; Vicki Steinle; Daniel Thurm – ZDM: Mathematics Education, 2024
Assessing students' (mis)conceptions is a challenging task for teachers as well as for researchers. While individual assessment, for example through interviews, can provide deep insights into students' thinking, this is very time-consuming and therefore not feasible for whole classes or even larger settings. For those settings, automatically…
Descriptors: Multiple Choice Tests, Formative Evaluation, Mathematics Tests, Misconceptions
Shukla, Vishakha; Long, Madeleine; Bhatia, Vrinda; Rubio-Fernandez, Paula – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2022
While most research on scalar implicature has focused on the lexical scale "some" vs "all," here we investigated an understudied scale formed by two syntactic constructions: categorizations (e.g., "Wilma is a nurse") and comparisons ("Wilma is like a nurse"). An experimental study by Rubio-Fernandez et al.…
Descriptors: Cues, Pragmatics, Comparative Analysis, Syntax
Robie, Chet; Meade, Adam W.; Risavy, Stephen D.; Rasheed, Sabah – Educational and Psychological Measurement, 2022
The effects of different response option orders on survey responses have been studied extensively. The typical research design involves examining the differences in response characteristics between conditions with the same item stems and response option orders that differ in valence--either incrementally arranged (e.g., strongly disagree to…
Descriptors: Likert Scales, Psychometrics, Surveys, Responses

Peer reviewed
Direct link
