Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 3 |
| Since 2017 (last 10 years) | 7 |
| Since 2007 (last 20 years) | 14 |
Descriptor
| Computer Software | 22 |
| Test Items | 22 |
| Test Reliability | 12 |
| Difficulty Level | 9 |
| Item Analysis | 7 |
| Reliability | 7 |
| Comparative Analysis | 5 |
| Computer Assisted Testing | 5 |
| Foreign Countries | 5 |
| Goodness of Fit | 5 |
| Language Tests | 5 |
| More ▼ | |
Source
Author
| Ahmed, Tamim | 1 |
| Aiken, Lewis R. | 1 |
| Akbari, Alireza | 1 |
| Aryadoust, Vahid | 1 |
| Baghaei, Purya | 1 |
| Boekkooi-Timminga, Ellen | 1 |
| Breyer, F. Jay | 1 |
| Cobern, William W. | 1 |
| Davidson, Fred | 1 |
| Demir, Mevhibe Kobak | 1 |
| Deng, Nina | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 2 |
| Elementary Education | 1 |
| High Schools | 1 |
| Postsecondary Education | 1 |
| Secondary Education | 1 |
Audience
| Practitioners | 2 |
| Teachers | 1 |
Location
| Australia | 1 |
| Austria | 1 |
| Belgium | 1 |
| Canada | 1 |
| Chile | 1 |
| Cyprus | 1 |
| Czech Republic | 1 |
| Denmark | 1 |
| Estonia | 1 |
| France | 1 |
| Germany | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Peabody Picture Vocabulary… | 1 |
What Works Clearinghouse Rating
Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025
Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…
Descriptors: Models, Test Items, Educational Assessment, Scores
Rao, Dhawaleswar; Saha, Sujan Kumar – IEEE Transactions on Learning Technologies, 2020
Automatic multiple choice question (MCQ) generation from a text is a popular research area. MCQs are widely accepted for large-scale assessment in various domains and applications. However, manual generation of MCQs is expensive and time-consuming. Therefore, researchers have been attracted toward automatic MCQ generation since the late 90's.…
Descriptors: Multiple Choice Tests, Test Construction, Automation, Computer Software
Aryadoust, Vahid; Ng, Li Ying; Sayama, Hiroki – Language Testing, 2021
Over the past decades, the application of Rasch measurement in language assessment has gradually increased. In the present study, we coded 215 papers using Rasch measurement published in 21 applied linguistics journals for multiple features. We found that seven Rasch models and 23 software packages were adopted in these papers, with many-facet…
Descriptors: Language Tests, Testing, Test Items, Network Analysis
Ravand, Hamdollah; Baghaei, Purya – International Journal of Testing, 2020
More than three decades after their introduction, diagnostic classification models (DCM) do not seem to have been implemented in educational systems for the purposes they were devised. Most DCM research is either methodological for model development and refinement or retrofitting to existing nondiagnostic tests and, in the latter case, basically…
Descriptors: Classification, Models, Diagnostic Tests, Test Construction
Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019
The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…
Descriptors: Test Items, Translation, Computer Software, Evaluators
Maghfiroh, Anissa; Kuswanto, Heru – International Journal of Instruction, 2022
This research aims to reveal the effectiveness of the use of Kofie GeBoL media in improving (1) vector representation ability and (2) critical thinking ability in physics instruction. It is a descriptive quantitative study with the quasi-experiment design. It was conducted in two stages: empirical try out and implementation of Kofie GeboL to see…
Descriptors: Physics, Instructional Effectiveness, Critical Thinking, Thinking Skills
Pelánek, Radek; Effenberger, Tomáš; Kukucka, Adam – Journal of Educational Data Mining, 2022
We study the automatic identification of educational items worthy of content authors' attention. Based on the results of such analysis, content authors can revise and improve the content of learning environments. We provide an overview of item properties relevant to this task, including difficulty and complexity measures, item discrimination, and…
Descriptors: Item Analysis, Identification, Difficulty Level, Case Studies
Demir, Mevhibe Kobak; Gür, Hülya – Educational Research and Reviews, 2016
This study was aimed to develop a valid and reliable perception scale in order to determine the perceptions of pre-service teachers towards the use of WebQuest in mathematics teaching. The study was conducted with 115 junior and senior pre-service teachers at Balikesir University's Faculty of Education, Computer Education and Instructional…
Descriptors: Foreign Countries, Attitude Measures, Likert Scales, Test Construction
Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Descriptors: Student Evaluation, Item Response Theory, Models, Simulation
Ahmed, Tamim; Hanif, Maria – Journal of Education and Practice, 2016
This study is intended to investigate student's achievement capability among two families i.e. Low and High income families and designed for primary level learners. A Reading, Arithmetic and Writing (RAW) Achievement test that was developed as a part of another research study (Tamim Ahmed Khan, 2015) was adopted for this study. Both English medium…
Descriptors: Low Income, Performance Based Assessment, Elementary School Students, Achievement Tests
OECD Publishing, 2013
The Programme for the International Assessment of Adult Competencies (PIAAC) has been planned as an ongoing program of assessment. The first cycle of the assessment has involved two "rounds." The first round, which is covered by this report, took place over the period of January 2008-October 2013. The main features of the first cycle of…
Descriptors: International Assessment, Adults, Skills, Test Construction
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Deng, Nina – ProQuest LLC, 2011
Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…
Descriptors: Item Response Theory, Test Theory, Computation, Classification
Pae, Hye K.; Greenberg, Daphne; Morris, Robin D. – Language Assessment Quarterly, 2012
The aim of this study was to apply the Rasch model to an analysis of the psychometric properties of the Peabody Picture Vocabulary Test--III Form A (PPVT--IIIA) items with struggling adult readers. The PPVT--IIIA was administered to 229 African American adults whose isolated word reading skills were between third and fifth grades. Conformity of…
Descriptors: African Americans, Test Items, Construct Validity, Test Validity
A Zero-One Programming Approach to Gulliksen's Matched Random Subtests Method. Research Report 86-4.
van der Linden, Wim J.; Boekkooi-Timminga, Ellen – 1986
In order to estimate the classical coefficient of test reliability, parallel measurements are needed. H. Gulliksen's matched random subtests method, which is a graphical method for splitting a test into parallel test halves, has practical relevance because it maximizes the alpha coefficient as a lower bound of the classical test reliability…
Descriptors: Algorithms, Computer Assisted Testing, Computer Software, Difficulty Level
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
