ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	3
Since 2017 (last 10 years)	7
Since 2007 (last 20 years)	14

Descriptor

Computer Software	22
Test Items	22
Test Reliability	12
Difficulty Level	9
Item Analysis	7
Reliability	7
Comparative Analysis	5
Computer Assisted Testing	5
Foreign Countries	5
Goodness of Fit	5
Language Tests	5
Models	5
Statistical Analysis	5
Test Construction	5
English (Second Language)	4
Higher Education	4
Scoring	4
Second Language Learning	4
Simulation	4
Test Validity	4
Testing	4
Correlation	3
Estimation (Mathematics)	3
Evaluators	3
Interrater Reliability	3
More ▼

Source

Journal of Educational…	2
Collegiate Microcomputer	1
ETS Research Report Series	1
Educational Research and…	1
Educational and Psychological…	1
IEEE Transactions on Learning…	1
International Journal of…	1
International Journal of…	1
Journal of Education and…	1
Journal of Educational Data…	1
Language Assessment Quarterly	1
Language Testing	1
Language Testing in Asia	1
OECD Publishing	1
ProQuest LLC	1
System	1
More ▼

Publication Type

Journal Articles	15
Reports - Research	14
Reports - Evaluative	3
Collected Works - General	2
Reports - Descriptive	2
Speeches/Meeting Papers	2
Books	1
Computer Programs	1
Dissertations/Theses -…	1
Reports - General	1
Tests/Questionnaires	1
More ▼

Education Level

Higher Education	2
Elementary Education	1
High Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Practitioners	2
Teachers	1

Location

Australia	1
Austria	1
Belgium	1
Canada	1
Chile	1
Cyprus	1
Czech Republic	1
Denmark	1
Estonia	1
France	1
Germany	1
India	1
Indonesia	1
Ireland	1
Italy	1
Japan	1
Netherlands	1
Norway	1
Poland	1
Russia	1
Slovakia	1
South Korea	1
Spain	1
Sweden	1
Turkey	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Peabody Picture Vocabulary…

What Works Clearinghouse Rating

Showing 1 to 15 of 22 results Save | Export

Modeling Directional Testlet Effects on Multiple Open-Ended Questions

Peer reviewed

Direct link

Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025

Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…

Descriptors: Models, Test Items, Educational Assessment, Scores

Automatic Multiple Choice Question Generation From Text: A Survey

Peer reviewed

Direct link

Rao, Dhawaleswar; Saha, Sujan Kumar – IEEE Transactions on Learning Technologies, 2020

Automatic multiple choice question (MCQ) generation from a text is a popular research area. MCQs are widely accepted for large-scale assessment in various domains and applications. However, manual generation of MCQs is expensive and time-consuming. Therefore, researchers have been attracted toward automatic MCQ generation since the late 90's.…

Descriptors: Multiple Choice Tests, Test Construction, Automation, Computer Software

A Comprehensive Review of Rasch Measurement in Language Assessment: Recommendations and Guidelines for Research

Peer reviewed

Direct link

Aryadoust, Vahid; Ng, Li Ying; Sayama, Hiroki – Language Testing, 2021

Over the past decades, the application of Rasch measurement in language assessment has gradually increased. In the present study, we coded 215 papers using Rasch measurement published in 21 applied linguistics journals for multiple features. We found that seven Rasch models and 23 software packages were adopted in these papers, with many-facet…

Descriptors: Language Tests, Testing, Test Items, Network Analysis

Diagnostic Classification Models: Recent Developments, Practical Issues, and Prospects

Peer reviewed

Direct link

Ravand, Hamdollah; Baghaei, Purya – International Journal of Testing, 2020

More than three decades after their introduction, diagnostic classification models (DCM) do not seem to have been implemented in educational systems for the purposes they were devised. Most DCM research is either methodological for model development and refinement or retrofitting to existing nondiagnostic tests and, in the latter case, basically…

Descriptors: Classification, Models, Diagnostic Tests, Test Construction

Calibrated Parsing Items Evaluation: A Step towards Objectifying the Translation Assessment

Peer reviewed

Direct link

Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019

The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…

Descriptors: Test Items, Translation, Computer Software, Evaluators

Benthik Android Physics Comic Effectiveness for Vector Representation and Crtitical Thinking Students' Improvement

Peer reviewed
PDF on ERIC

Download full text

Maghfiroh, Anissa; Kuswanto, Heru – International Journal of Instruction, 2022

This research aims to reveal the effectiveness of the use of Kofie GeBoL media in improving (1) vector representation ability and (2) critical thinking ability in physics instruction. It is a descriptive quantitative study with the quasi-experiment design. It was conducted in two stages: empirical try out and implementation of Kofie GeboL to see…

Descriptors: Physics, Instructional Effectiveness, Critical Thinking, Thinking Skills

Towards Design-Loop Adaptivity: Identifying Items for Revision

Peer reviewed
PDF on ERIC

Download full text

Pelánek, Radek; Effenberger, Tomáš; Kukucka, Adam – Journal of Educational Data Mining, 2022

We study the automatic identification of educational items worthy of content authors' attention. Based on the results of such analysis, content authors can revise and improve the content of learning environments. We provide an overview of item properties relevant to this task, including difficulty and complexity measures, item discrimination, and…

Descriptors: Item Analysis, Identification, Difficulty Level, Case Studies

A Perception Scale on the Use of Webquests in Mathematics Teaching: A Study of Scale Development

Peer reviewed
PDF on ERIC

Download full text

Demir, Mevhibe Kobak; Gür, Hülya – Educational Research and Reviews, 2016

This study was aimed to develop a valid and reliable perception scale in order to determine the perceptions of pre-service teachers towards the use of WebQuest in mathematics teaching. The study was conducted with 115 junior and senior pre-service teachers at Balikesir University's Faculty of Education, Computer Education and Instructional…

Descriptors: Foreign Countries, Attitude Measures, Likert Scales, Test Construction

Item Response Theory Models for Performance Decline during Testing

Peer reviewed

Direct link

Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…

Descriptors: Student Evaluation, Item Response Theory, Models, Simulation

Performance Assessment of High and Low Income Families through "Online RAW Achievement Battery Test" of Primary Grade Students

Peer reviewed
PDF on ERIC

Download full text

Ahmed, Tamim; Hanif, Maria – Journal of Education and Practice, 2016

This study is intended to investigate student's achievement capability among two families i.e. Low and High income families and designed for primary level learners. A Reading, Arithmetic and Writing (RAW) Achievement test that was developed as a part of another research study (Tamim Ahmed Khan, 2015) was adopted for this study. Both English medium…

Descriptors: Low Income, Performance Based Assessment, Elementary School Students, Achievement Tests

Technical Report of the Survey of Adult Skills (PIAAC)

Direct link

OECD Publishing, 2013

The Programme for the International Assessment of Adult Competencies (PIAAC) has been planned as an ongoing program of assessment. The first cycle of the assessment has involved two "rounds." The first round, which is covered by this report, took place over the period of January 2008-October 2013. The main features of the first cycle of…

Descriptors: International Assessment, Adults, Skills, Test Construction

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Evaluating IRT- and CTT-Based Methods of Estimating Classification Consistency and Accuracy Indices from Single Administrations

Direct link

Deng, Nina – ProQuest LLC, 2011

Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…

Descriptors: Item Response Theory, Test Theory, Computation, Classification

Construct Validity and Measurement Invariance of the Peabody Picture Vocabulary Test-III Form A

Peer reviewed

Direct link

Pae, Hye K.; Greenberg, Daphne; Morris, Robin D. – Language Assessment Quarterly, 2012

The aim of this study was to apply the Rasch model to an analysis of the psychometric properties of the Peabody Picture Vocabulary Test--III Form A (PPVT--IIIA) items with struggling adult readers. The PPVT--IIIA was administered to 229 African American adults whose isolated word reading skills were between third and fifth grades. Conformity of…

Descriptors: African Americans, Test Items, Construct Validity, Test Validity

A Zero-One Programming Approach to Gulliksen's Matched Random Subtests Method. Research Report 86-4.

Download full text

van der Linden, Wim J.; Boekkooi-Timminga, Ellen – 1986

In order to estimate the classical coefficient of test reliability, parallel measurements are needed. H. Gulliksen's matched random subtests method, which is a graphical method for splitting a test into parallel test halves, has practical relevance because it maximizes the alpha coefficient as a lower bound of the classical test reliability…

Descriptors: Algorithms, Computer Assisted Testing, Computer Software, Difficulty Level

Previous Page | Next Page »

Pages: 1 | 2

Ahmed, Tamim	1
Aiken, Lewis R.	1
Akbari, Alireza	1
Aryadoust, Vahid	1
Baghaei, Purya	1
Boekkooi-Timminga, Ellen	1
Breyer, F. Jay	1
Cobern, William W.	1
Davidson, Fred	1
Demir, Mevhibe Kobak	1
Deng, Nina	1
Effenberger, Tomáš	1
Greenberg, Daphne	1
Gür, Hülya	1
Haladyna, Thomas M.	1
Hanif, Maria	1
Jin, Kuan-Yu	1
Kuan-Yu Jin	1
Kukucka, Adam	1
Kuswanto, Heru	1
Levitov, Justin E.	1
Lorenz, Florian	1
Lunz, Mary E.	1
Maghfiroh, Anissa	1
More ▼