Publication Date
| In 2026 | 0 |
| Since 2025 | 4 |
| Since 2022 (last 5 years) | 10 |
| Since 2017 (last 10 years) | 21 |
| Since 2007 (last 20 years) | 33 |
Descriptor
| Language Tests | 64 |
| Scoring | 64 |
| Test Items | 64 |
| English (Second Language) | 34 |
| Second Language Learning | 28 |
| Test Construction | 21 |
| Foreign Countries | 20 |
| Language Proficiency | 20 |
| Test Validity | 16 |
| Second Language Instruction | 15 |
| Item Analysis | 14 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 10 |
| Postsecondary Education | 7 |
| Elementary Education | 4 |
| Secondary Education | 3 |
| Junior High Schools | 2 |
| Middle Schools | 2 |
| Elementary Secondary Education | 1 |
| Grade 5 | 1 |
| Grade 6 | 1 |
| High Schools | 1 |
Laws, Policies, & Programs
| Elementary and Secondary… | 1 |
Assessments and Surveys
| Test of English as a Foreign… | 8 |
| ACT Assessment | 1 |
| Alabama High School… | 1 |
| Computer Attitude Scale | 1 |
| English Proficiency Test | 1 |
| General Educational… | 1 |
| International English… | 1 |
| Test of Written English | 1 |
What Works Clearinghouse Rating
Venessa F. Manna; Shuhong Li; Spiros Papageorgiou; Lixiong Gu – ETS Research Report Series, 2025
This technical manual describes the purpose and intended uses of the TOEFL iBT test, its target test-taker population, and relevant language use domains. The test design and scoring procedures are presented first, followed by a research agenda intended to support the interpretation and use of test scores. Given the updates to the test starting…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Test Construction
Sara T. Cushing – ETS Research Report Series, 2025
This report provides an in-depth comparison of TOEFL iBT® and the Duolingo English Test (DET) in terms of the degree to which both tests assess academic language proficiency in listening, reading, writing, and speaking. The analysis is based on publicly available documentation on both tests, including sample test questions available on the test…
Descriptors: Language Tests, English (Second Language), Second Language Learning, Academic Language
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction
Zhiqiang Yang; Chengyuan Yu – Asia Pacific Education Review, 2025
This study investigated the test fairness of the translation section of a large-scale English test in China by examining its Differential Test Functioning (DTF) and Differential Item Functioning (DIF) across gender and major. Regarding DTF, the entire translation section exhibits partial strong measurement invariance across female and male…
Descriptors: Multiple Choice Tests, Test Items, Scoring, Translation
Alpizar, David; Li, Tongyun; Norris, John M.; Gu, Lixiong – Language Testing, 2023
The C-test is a type of gap-filling test designed to efficiently measure second language proficiency. The typical C-test consists of several short paragraphs with the second half of every second word deleted. The words with deleted parts are considered as items nested within the corresponding paragraph. Given this testlet structure, it is commonly…
Descriptors: Psychometrics, Language Tests, Second Language Learning, Test Items
Chung, Seungwon; Cai, Li – Journal of Educational and Behavioral Statistics, 2021
In the research reported here, we propose a new method for scale alignment and test scoring in the context of supporting students with disabilities. In educational assessment, students from these special populations take modified tests because of a demonstrated disability that requires more assistance than standard testing accommodation. Updated…
Descriptors: Students with Disabilities, Scoring, Achievement Tests, Test Items
Tomkowicz, Joanna; Kim, Dong-In; Wan, Ping – Online Submission, 2022
In this study we evaluated the stability of item parameters and student scores, using the pre-equated (pre-pandemic) parameters from Spring 2019 and post-equated (post-pandemic) parameters from Spring 2021 in two calibration and equating designs related to item parameter treatment: re-estimating all anchor parameters (Design 1) and holding the…
Descriptors: Equated Scores, Test Items, Evaluation Methods, Pandemics
Li, Shuai; Wen, Ting; Li, Xian; Feng, Yali; Lin, Chuan – Language Testing, 2023
This study compared holistic and analytic marking methods for their effects on parameter estimation (of examinees, raters, and items) and rater cognition in assessing speech act production in L2 Chinese. Seventy American learners of Chinese completed an oral Discourse Completion Test assessing requests and refusals. Four first-language (L1)…
Descriptors: Speech Acts, Second Language Learning, Second Language Instruction, Chinese
Gawliczek, Piotr; Krykun, Viktoriia; Tarasenko, Nataliya; Tyshchenko, Maksym; Shapran, Oleksandr – Advanced Education, 2021
The article deals with the innovative, cutting age solution within the language testing realm, namely computer adaptive language testing (CALT) in accordance with the NATO Standardization Agreement 6001 (NATO STANAG 6001) requirements for further implementation in foreign language training of personnel of the Armed Forces of Ukraine (AF of…
Descriptors: Computer Assisted Testing, Adaptive Testing, Language Tests, Second Language Instruction
Nebraska Department of Education, 2024
The Nebraska Student-Centered Assessment System (NSCAS) is a statewide assessment system that embodies Nebraska's holistic view of students and helps them prepare for success in postsecondary education, career, and civic life. It uses multiple measures throughout the year to provide educators and decision-makers at all levels with the insights…
Descriptors: Student Evaluation, Evaluation Methods, Elementary School Students, Middle School Students
Coniam, David; Lee, Tony; Milanovic, Michael; Pike, Nigel; Zhao, Wen – Language Education & Assessment, 2022
The calibration of test materials generally involves the interaction between empirical analysis and expert judgement. This paper explores the extent to which scale familiarity might affect expert judgement as a component of test validation in the calibration process. It forms part of a larger study that investigates the alignment of the…
Descriptors: Specialists, Language Tests, Test Validity, College Faculty
Toroujeni, Seyyed Morteza Hashemi – Education and Information Technologies, 2022
Score interchangeability of Computerized Fixed-Length Linear Testing (henceforth CFLT) and Paper-and-Pencil-Based Testing (henceforth PPBT) has become a controversial issue over the last decade when technology has meaningfully restructured methods of the educational assessment. Given this controversy, various testing guidelines published on…
Descriptors: Computer Assisted Testing, Reading Tests, Reading Comprehension, Scoring
Lin, Chih-Kai – Language Assessment Quarterly, 2018
With multiple options to choose from, there is always a chance of lucky guessing by examinees on multiple-choice (MC) items, thereby potentially introducing bias in item difficulty estimates. Correct responses by random guessing thus pose threats to the validity of claims made from test performance on an MC test. Under the Rasch framework, the…
Descriptors: Guessing (Tests), Item Response Theory, Multiple Choice Tests, Language Tests
van Rijn, Peter W.; Ali, Usama S. – ETS Research Report Series, 2018
A computer program was developed to estimate speed-accuracy response models for dichotomous items. This report describes how the models are estimated and how to specify data and input files. An example using data from a listening section of an international language test is described to illustrate the modeling approach and features of the computer…
Descriptors: Computer Software, Computation, Reaction Time, Timed Tests
New Meridian Corporation, 2020
New Meridian Corporation has developed the "Quality Testing Standards and Criteria for Comparability Claims" (QTS) to provide guidance to states that are interested in including New Meridian content and would like to either keep reporting scores on the New Meridian Scale or use the New Meridian performance levels; that is, the state…
Descriptors: Testing, Standards, Comparative Analysis, Test Content

Peer reviewed
Direct link
