ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	3
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	7

Descriptor

Simulation	15
Test Construction	15
Test Format	15
Computer Assisted Testing	7
Test Items	6
Comparative Analysis	5
Equated Scores	4
Item Response Theory	4
Test Reliability	4
Adaptive Testing	3
Cutting Scores	3
Error of Measurement	3
Psychometrics	3
Statistical Analysis	3
Test Validity	3
Educational Assessment	2
Elementary Secondary Education	2
Evaluation Methods	2
Higher Education	2
Measurement Techniques	2
Multiple Choice Tests	2
Sample Size	2
Scores	2
Test Length	2
Academic Ability	1
More ▼

Source

Journal of Educational…	2
Measurement:…	2
Academic Medicine	1
ETS Research Report Series	1
Education and Information…	1
Journal of Psychoeducational…	1
ProQuest LLC	1
Studies in Educational…	1

Publication Type

Journal Articles	9
Reports - Research	9
Reports - Evaluative	4
Speeches/Meeting Papers	4
Dissertations/Theses -…	1
Reports - Descriptive	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Practical Considerations in Choosing an Anchor Test Form for Equating under the Random Groups Design

Peer reviewed

Direct link

Cui, Zhongmin; He, Yong – Measurement: Interdisciplinary Research and Perspectives, 2023

Careful considerations are necessary when there is a need to choose an anchor test form from a list of old test forms for equating under the random groups design. The choice of the anchor form potentially affects the accuracy of equated scores on new test forms. Few guidelines, however, can be found in the literature on choosing the anchor form.…

Descriptors: Test Format, Equated Scores, Best Practices, Test Construction

Computerized Multistage Testing: Principles, Designs and Practices with R

Peer reviewed

Direct link

Yigiter, Mahmut Sami; Dogan, Nuri – Measurement: Interdisciplinary Research and Perspectives, 2023

In recent years, Computerized Multistage Testing (MST), with their versatile benefits, have found themselves a wide application in large scale assessments and have increased their popularity. The fact that forms can be made ready before the exam application, such as a linear test, and that they can be adapted according to the test taker's ability…

Descriptors: Programming Languages, Monte Carlo Methods, Computer Assisted Testing, Test Format

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

Does Maximizing Information at the Cut Score Always Maximize Classification Accuracy and Consistency?

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2016

A common suggestion made in the psychometric literature for fixed-length classification tests is that one should design tests so that they have maximum information at the cut score. Designing tests in this way is believed to maximize the classification accuracy and consistency of the assessment. This article uses simulated examples to illustrate…

Descriptors: Cutting Scores, Psychometrics, Test Construction, Classification

On Using Simulations to Inform Decision Making during Instrument Development

Peer reviewed

Direct link

Morgan, Grant B.; Moore, Courtney A.; Floyd, Harlee S. – Journal of Psychoeducational Assessment, 2018

Although content validity--how well each item of an instrument represents the construct being measured--is foundational in the development of an instrument, statistical validity is also important to the decisions that are made based on the instrument. The primary purpose of this study is to demonstrate how simulation studies can be used to assist…

Descriptors: Simulation, Decision Making, Test Construction, Validity

Exploring Alternative Test Form Linking Designs with Modified Equating Sample Size and Anchor Test Length. Research Report. ETS RR-13-02

Peer reviewed
PDF on ERIC

Download full text

Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013

The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…

Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation

A Comparison of Equating/Linking Using the Stocking-Lord Method and Concurrent Calibration with Mixed-Format Tests in the Non-Equivalent Groups Common-Item Design under IRT

Direct link

Tian, Feng – ProQuest LLC, 2011

There has been a steady increase in the use of mixed-format tests, that is, tests consisting of both multiple-choice items and constructed-response items in both classroom and large-scale assessments. This calls for appropriate equating methods for such tests. As Item Response Theory (IRT) has rapidly become mainstream as the theoretical basis for…

Descriptors: Item Response Theory, Comparative Analysis, Equated Scores, Statistical Analysis

Detecting Multidimensionality Due to Curricular Differences.

Peer reviewed

DeMars, Christine E. – Journal of Educational Measurement, 2003

Generated data to simulate multidimensionality resulting from including two or four subtopics on a test. DIMTEST analysis results suggest that including multiple topics, when they are commonly taught together, can lead to conceptual multidimensionality and mathematical multidimensionality. (SLD)

Descriptors: Curriculum, Simulation, Test Construction, Test Format

The Effectiveness of Circular Equating as a Criterion for Evaluating Equating.

Download full text

Wang, Tianyou; Hanson, Bradley A.; Harris, Deborah J. – 1998

Equating a test form to itself through a chain of equatings, commonly referred to as circular equating, has been widely used as a criterion to evaluate the adequacy of equating. This paper uses both analytical methods and simulation methods to show that this criterion is in general invalid in serving this purpose. For the random groups design done…

Descriptors: Equated Scores, Evaluation Methods, Heuristics, Sampling

Some Considerations in Maintaining Adaptive Test Item Pools.

Download full text

Stocking, Martha L. – 1988

The construction of parallel editions of conventional tests for purposes of test security while maintaining score comparability has always been a recognized and difficult problem in psychometrics and test construction. The introduction of new modes of test construction, e.g., adaptive testing, changes the nature of the problem, but does not make…

Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Identification

The Historical Development of Fit and Its Assessment in the Computer Adaptive Testing Environment.

Download full text

Stone, Gregory Ethan – 1994

The quality of fit between the data and the measurement model is fundamental to any discussion of results. Fit has been the subject of inquiry since as early as the 1920s. Most early explorations concentrated on assessing global fit or subset fits on fixed length, traditional paper and pencil tests given as a single unit. The detection of aberrant…

Descriptors: Adaptive Testing, Computer Assisted Testing, Educational Assessment, Educational History

Measuring Intuitive Knowledge in Science: The Development of the What-If Test.

Peer reviewed

Swaak, Janine; de Jong, Ton – Studies in Educational Evaluation, 1996

A way to assess knowledge acquired through simulation-based learning (intuitive knowledge) is presented. A "WHAT-IF" test item format is developed, and two pilot studies involving 74 college students responding to WHAT-IF items are described. The tests did tap improvement in learning, although test validity was only partially supportive.…

Descriptors: College Students, Computer Assisted Testing, Elementary Secondary Education, Higher Education

Status Report on the NBME's Computer-Based Testing.

Peer reviewed

Clyman, Stephen G.; Orr, Nancy A. – Academic Medicine, 1990

The process proposed for the development and use of computer-based testing, including simulation and multiple-choice questions, as part of the National Board of Medical Examiners' certification sequence is outlined. Summary reports of first-phase pilot testing in six medical schools are appended. (MSE)

Descriptors: Computer Assisted Testing, Higher Education, Licensing Examinations (Professions), Medical Education

Effects of Test Length and Advancement Score on Several Criterion-Referenced Test Reliability and Validity Indices. Laboratory of Psychometric and Evaluation Research Report No. 86.

Download full text

Eignor, Daniel R.; Hambleton, Ronald K. – 1979

The purpose of the investigation was to obtain some relationships among (1) test lengths, (2) shape of domain-score distributions, (3) advancement scores, and (4) several criterion-referenced test score reliability and validity indices. The study was conducted using computer simulation methods. The values of variables under study were set to be…

Descriptors: Comparative Analysis, Computer Assisted Testing, Criterion Referenced Tests, Cutting Scores

Varieties of Performance Assessments.

Download full text

Finch, Fredrick; Foertsch, Mary – 1993

Performance assessment is reviewed as an emerging form of alternative assessment, focusing on how it has been defined in the research literature, the criteria for evaluating its authenticity, the measurement of process and product, and the link between assessment and instruction. Three important dimensions that must be considered in describing…

Descriptors: Alternative Assessment, Educational Assessment, Elementary Secondary Education, Evaluation Methods

Babcock, Ben	1
Clyman, Stephen G.	1
Cui, Zhongmin	1
DeMars, Christine E.	1
Dogan, Nuri	1
Eignor, Daniel R.	1
Finch, Fredrick	1
Floyd, Harlee S.	1
Foertsch, Mary	1
Gelbal, Selahattin	1
Hambleton, Ronald K.	1
Hanson, Bradley A.	1
Harris, Deborah J.	1
He, Yong	1
Lee, Yi-Hsuan	1
Moore, Courtney A.	1
Morgan, Grant B.	1
Orr, Nancy A.	1
Ozdemir, Burhanettin	1
Qian, Jiahe	1
Stocking, Martha L.	1
Stone, Gregory Ethan	1
Swaak, Janine	1
Tian, Feng	1
Wang, Lin	1
More ▼