ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	4

Descriptor

Criterion Referenced Tests	9
Norm Referenced Tests	5
Scores	4
Mathematics Tests	3
Test Items	3
Comparative Analysis	2
Content Analysis	2
Elementary Education	2
Evaluators	2
Item Analysis	2
Reading Tests	2
Standard Setting (Scoring)	2
Test Construction	2
Test Content	2
Test Use	2
Academic Achievement	1
Academic Standards	1
Accountability	1
Achievement Rating	1
Achievement Tests	1
Age Differences	1
Assistive Technology	1
Classification	1
College Students	1
Correlation	1
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	9
Reports - Research	6
Reports - Evaluative	3

Education Level

Grade 4	2
Grade 7	2
Higher Education	2
Postsecondary Education	2
Elementary Secondary Education	1
Grade 10	1
Grade 3	1
Grade 6	1

Audience

Location

Connecticut	1
Georgia	1

Laws, Policies, & Programs

Assessments and Surveys

Georgia Criterion Referenced…	1
Metropolitan Achievement Tests	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Analyzing Complete Generalizability Theory Designs Using Structural Equation Models

Peer reviewed

Direct link

Walter P. Vispoel; Hyeri Hong; Hyeryung Lee; Terrence D. Jorgensen – Applied Measurement in Education, 2023

We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software ("lavaan" in R), compare results to those obtained from numerous ANOVA-based packages, and apply those results in practical ways using data obtained from a large sample of respondents, who completed the Self-Perception…

Descriptors: Generalizability Theory, Design, Structural Equation Models, Error of Measurement

Validating Automated Essay Scoring: A (Modest) Refinement of the "Gold Standard"

Peer reviewed

Direct link

Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015

By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…

Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests

Gender DIF in Reading and Mathematics Tests with Mixed Item Formats

Peer reviewed

Direct link

Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2012

This was a study of differential item functioning (DIF) for grades 4, 7, and 10 reading and mathematics items from state criterion-referenced tests. The tests were composed of multiple-choice and constructed-response items. Gender DIF was investigated using POLYSIBTEST and a Rasch procedure. The Rasch procedure flagged more items for DIF than did…

Descriptors: Test Bias, Gender Differences, Reading Tests, Mathematics Tests

Mathematics Performance of Students with and without Disabilities under Accommodated Conditions Using Resource Guides and Calculators on High Stakes Tests

Peer reviewed

Direct link

Engelhard, George, Jr.; Fincher, Melissa; Domaleski, Christopher S. – Applied Measurement in Education, 2011

This study examines the effects of two test administration accommodations on the mathematics performance of students within the context of a large-scale statewide assessment. The two test administration accommodations were resource guides and calculators. A stratified random sample of schools was selected to represent the demographic…

Descriptors: Testing Accommodations, Disabilities, High Stakes Tests, Program Effectiveness

Understanding the Factors That Influence Decisions of Panelists in a Standard-Setting Study

Peer reviewed

Direct link

Ferdous, Abdullah A.; Plake, Barbara S. – Applied Measurement in Education, 2005

This study addressed what standard-setting panelists think about when they make item performance estimates for a barely proficient student. This study extended previous studies by considering the factors that influenced panelists' decisions in an Angoff (1971)-based standard-setting study as a function of their item performance estimates.…

Descriptors: Test Items, Standard Setting (Scoring), Decision Making, Student Evaluation

Use and Effect of Caution Indices in Detecting Aberrant Patterns of Standard-Setting Judgments.

Peer reviewed

Jaeger, Richard M. – Applied Measurement in Education, 1988

The modified caution index's use in identifying judges whose patterns of item judgment appear aberrant when compared with the pattern produced by the entire group (N=158) of judges was studied. Effects on test standards and passing rates of removing test standards of these judges were also assessed. (TJH)

Descriptors: Criterion Referenced Tests, Evaluators, Item Analysis, Mathematics Tests

Achievement Measures of School Effectiveness: Comparison of Model Stability across Years.

Peer reviewed

Crone, Linda J.; And Others – Applied Measurement in Education, 1995

The feasibility of combining criterion-referenced and norm-referenced tests into a school achievement test score to be used for school effectiveness classification was studied with 361 elementary schools across 2 years and across a subsample of 264 schools. Results support combining scores of different tests to measure school effectiveness. (SLD)

Descriptors: Achievement Tests, Classification, Comparative Analysis, Criterion Referenced Tests

The Potential of Criterion-Referenced Tests with Projected Norms.

Peer reviewed

Behuniak, Peter; Tucker, Charlene – Applied Measurement in Education, 1992

Psychometrically linking a state criterion-referenced test (CRT) and a norm-referenced test (NRT) to yield NRT information through the CRT was studied with samples of 1,500 to 3,000 elementary school students per subject and grade level in Connecticut. A CRT/NRT link can create a focused and coherent assessment system. (SLD)

Descriptors: Content Analysis, Criterion Referenced Tests, Educational Assessment, Elementary Education

Customized Tests and Customized Norms.

Peer reviewed

Linn, Robert L.; Hambleton, Ronald K. – Applied Measurement in Education, 1991

Four main approaches to customized testing are described, and their resulting scores' valid uses and interpretations are discussed. Customized testing can yield valid normative and curriculum-specific information, although cautious application is needed to avoid misleading inferences about student achievement. (SLD)

Descriptors: Academic Achievement, Accountability, Criterion Referenced Tests, Curriculum

Behuniak, Peter	1
Crone, Linda J.	1
Domaleski, Christopher S.	1
Duchnowski, Matthew P.	1
Engelhard, George, Jr.	1
Escoffery, David S.	1
Ferdous, Abdullah A.	1
Fincher, Melissa	1
Hambleton, Ronald K.	1
Hyeri Hong	1
Hyeryung Lee	1
Jaeger, Richard M.	1
Lee, Yoonsun	1
Linn, Robert L.	1
Plake, Barbara S.	1
Powers, Donald E.	1
Taylor, Catherine S.	1
Terrence D. Jorgensen	1
Tucker, Charlene	1
Walter P. Vispoel	1
More ▼