ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	5
Since 2017 (last 10 years)	45
Since 2007 (last 20 years)	128

Descriptor

Statistical Analysis	128
Test Format	128
Foreign Countries	58
Test Items	53
Comparative Analysis	50
Scores	33
Multiple Choice Tests	31
Correlation	28
Item Response Theory	25
College Students	21
Test Reliability	21
Computer Assisted Testing	20
English (Second Language)	20
Language Tests	19
Second Language Learning	19
Undergraduate Students	19
Test Validity	18
Gender Differences	17
Difficulty Level	16
Equated Scores	15
Test Construction	15
Item Analysis	14
Student Attitudes	13
Academic Achievement	12
Mathematics Tests	12
More ▼

Publication Type

Reports - Research	117
Journal Articles	107
Tests/Questionnaires	10
Speeches/Meeting Papers	8
Dissertations/Theses -…	7
Reports - Evaluative	4
Numerical/Quantitative Data	2
Information Analyses	1

Education Level

Higher Education	55
Postsecondary Education	41
Elementary Education	17
Secondary Education	16
Middle Schools	10
Elementary Secondary Education	9
Grade 8	8
Junior High Schools	7
Early Childhood Education	5
High Schools	4
Primary Education	4
Grade 4	3
Grade 6	3
Intermediate Grades	3
Grade 2	2
Grade 3	2
Grade 7	2
Kindergarten	2
Adult Education	1
Grade 12	1
Grade 9	1
Preschool Education	1
More ▼

Audience

Location

Turkey	7
Germany	6
Japan	5
Australia	4
Iran	4
Philippines	3
Sweden	3
China	2
Czech Republic	2
Florida	2
Netherlands	2
Pennsylvania	2
Turkey (Ankara)	2
United States	2
Belgium	1
California	1
Chile (Santiago)	1
China (Shanghai)	1
Denmark	1
Estonia	1
Ghana	1
Indonesia	1
Israel	1
Italy	1
Luxembourg	1
More ▼

Laws, Policies, & Programs

What Works Clearinghouse Rating

Showing 1 to 15 of 128 results Save | Export

Simultaneous Linear Equating for Scenarios with Optional Test Versions or across Multiple Alternative Anchors

Peer reviewed
PDF on ERIC

Download full text

Tom Benton – Practical Assessment, Research & Evaluation, 2025

This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…

Descriptors: Equated Scores, Test Format, Test Items, Computation

Tailoring Support Following Summative Assessments: A Latent Profile Analysis of Student Outcomes across Five Medical Specialities

Peer reviewed

Direct link

Huiming Ding; Matt Homer – Advances in Health Sciences Education, 2025

Summative assessments are often underused for feedback, despite them being rich with data of students' applied knowledge and clinical and professional skills. To better inform teaching and student support, this study aims to gain insights from summative assessments through profiling students' performance patterns and identify those students…

Descriptors: Summative Evaluation, Profiles, Statistical Analysis, Outcomes of Education

Developing Surveys on Questionable Research Practices: Four Challenging Design Problems

Peer reviewed

Direct link

Christian Berggren; Bengt Gerdin; Solmaz Filiz Karabag – Journal of Academic Ethics, 2025

The exposure of scientific scandals and the increase of dubious research practices have generated a stream of studies on Questionable Research Practices (QRPs), such as failure to acknowledge co-authors, selective presentation of findings, or removal of data not supporting desired outcomes. In contrast to high-profile fraud cases, QRPs can be…

Descriptors: Test Construction, Test Bias, Test Format, Response Style (Tests)

A Comparison of IRT Linking Approaches under the Nonequivalent Groups Anchor Test Design

Direct link

Jiajing Huang – ProQuest LLC, 2022

The nonequivalent-groups anchor-test (NEAT) data-collection design is commonly used in large-scale assessments. Under this design, different test groups take different test forms. Each test form has its own unique items and all test forms share a set of common items. If item response theory (IRT) models are applied to analyze the test data, the…

Descriptors: Item Response Theory, Test Format, Test Items, Test Construction

Impacts of Differences in Group Abilities and Anchor Test Features on Three Non-IRT Test Equating Methods

Peer reviewed
PDF on ERIC

Download full text

Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024

The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…

Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests

Subscore Equating and Profile Reporting

Peer reviewed

Direct link

Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020

The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…

Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level

An Investigation of Item Position Effects by Means of IRT-Based Differential Item Functioning Methods

Peer reviewed
PDF on ERIC

Download full text

Soysal, Sumeyra; Yilmaz Kogar, Esin – International Journal of Assessment Tools in Education, 2021

In this study, whether item position effects lead to DIF in the condition where different test booklets are used was investigated. To do this the methods of Lord's chi-square and Raju's unsigned area with the 3PL model under with and without item purification were used. When the performance of the methods was compared, it was revealed that…

Descriptors: Item Response Theory, Test Bias, Test Items, Comparative Analysis

Evaluating the Effectiveness of the Expectation-Maximization (EM) Algorithm for Bayesian Network Calibration

Direct link

Tingir, Seyfullah – ProQuest LLC, 2019

Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…

Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability

Effect of Single-Value Response Styles on Latent Factor Model Convergence and Measures of Fit

Peer reviewed

Direct link

Harbaugh, Allen G.; Liu, Min – AERA Online Paper Repository, 2017

This research examines the effects of single-value response style contamination on measures of model fit and model convergence issues. A simulation study examines the effects resulting from percentage of contamination, number of manifest, number of reverse coded items, magnitude of standardized factor loadings, response scale granularity, and…

Descriptors: Goodness of Fit, Sample Size, Statistical Analysis, Test Format

Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms

Peer reviewed

Direct link

Debeer, Dries; Ali, Usama S.; van Rijn, Peter W. – Journal of Educational Measurement, 2017

Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…

Descriptors: Test Format, Test Construction, Statistical Analysis, Comparative Analysis

Processing Forced-Choice versus Check-All-That-Apply Question Formats: Evidence from Eye Tracking

Peer reviewed

Direct link

Neuert, Cornelia E. – Field Methods, 2017

Previous research has shown that check-all-that-apply (CATA) and forced-choice (FC) question formats do not produce comparable results. The cognitive processes underlying respondents' answers to both types of formats still require clarification. This study contributes to filling this gap by using eye-tracking data. Both formats are compared by…

Descriptors: Measurement Techniques, Test Format, Eye Movements, Cognitive Processes

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Can Reliability of Multiple Component Measuring Instruments Depend on Response Option Presentation Mode?

Peer reviewed

Direct link

Menold, Natalja; Raykov, Tenko – Educational and Psychological Measurement, 2016

This article examines the possible dependency of composite reliability on presentation format of the elements of a multi-item measuring instrument. Using empirical data and a recent method for interval estimation of group differences in reliability, we demonstrate that the reliability of an instrument need not be the same when polarity of the…

Descriptors: Test Reliability, Test Format, Test Items, Differences

Extension of Caution Indices to Mixed-Format Tests

Peer reviewed
PDF on ERIC

Download full text

Direct link

Sinharay, Sandip – Grantee Submission, 2018

Tatsuoka (1984) suggested several extended caution indices and their standardized versions that have been used as person-fit statistics by researchers such as Drasgow, Levine, and McLaughlin (1987), Glas and Meijer (2003), and Molenaar and Hoijtink (1990). However, these indices are only defined for tests with dichotomous items. This paper extends…

Descriptors: Test Format, Goodness of Fit, Item Response Theory, Error Patterns

An Investigation of Differential Mode Effects When Comparing Paper and Online ACT Testing. ACT Research & Policy. Technical Brief

Download full text

Wang, Lu; Steedle, Jeffrey – ACT, Inc., 2020

In recent ACT mode comparability studies, students testing on laptop or desktop computers earned slightly higher scores on average than students who tested on paper, especially on the ACT® reading and English tests (Li et al., 2017). Equating procedures adjust for such "mode effects" to make ACT scores comparable regardless of testing…

Descriptors: Test Format, Reading Tests, Language Tests, English

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

ETS Research Report Series	8
ProQuest LLC	7
Journal of Psychoeducational…	6
Applied Measurement in…	3
CBE - Life Sciences Education	3
Educational and Psychological…	3
International Journal of…	3
Language Assessment Quarterly	3
Language Testing	3
ACT, Inc.	2
Assessment & Evaluation in…	2
Educational Psychology	2
English Language Teaching	2
Grantee Submission	2
Interactive Learning…	2
International Journal of…	2
International Journal of…	2
Journal of Instructional…	2
Journal on English Language…	2
Practical Assessment,…	2
Research-publishing.net	2
Universal Journal of…	2
AERA Online Paper Repository	1
Accounting Education	1
Advances in Health Sciences…	1
More ▼

Aizawa, Kazumi	2
Ali, Usama S.	2
Bande, Rhodora A.	2
Bendulo, Hermabeth O.	2
Iso, Tatsuo	2
Lee, Yi-Hsuan	2
Macalinao, Myrna L.	2
Menold, Natalja	2
Oyzon, Voltaire Q.	2
Tibus, Erlinda D.	2
Abramzon, Andrea	1
Abshire, Elizabeth	1
Adair, Desmond	1
Adler, Rachel	1
Ahmadi, Alireza	1
Aksakalli, Ayhan	1
Alcaraz-Mármol, Gema	1
Aldrich, Rosalie S.	1
Alemi, Minoo	1
Allalouf, Avi	1
Alpayar, Cagla	1
Anakwe, Bridget	1
Anderson, Daniel	1
Ardi, Muhammad	1
Backes, Ben	1
More ▼

ACT Assessment	2
Test of English for…	2
Trends in International…	2
Cognitive Assessment System	1
Defining Issues Test	1
Flesch Kincaid Grade Level…	1
Florida Comprehensive…	1
Graduate Record Examinations	1
Kaufman Assessment Battery…	1
NEO Personality Inventory	1
National Assessment of…	1
North Carolina End of Course…	1
Rey Osterrieth Complex Figure…	1
SAT (College Admission Test)	1
Strategy Inventory for…	1
Study Process Questionnaire	1
Test Anxiety Inventory	1
Test of English as a Foreign…	1
Wechsler Adult Intelligence…	1
Wechsler Intelligence Scale…	1
Wechsler Memory Scale	1
Wisconsin Card Sorting Test	1
Woodcock Johnson Tests of…	1
More ▼