Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 4 |
Descriptor
Criterion Referenced Tests | 9 |
Norm Referenced Tests | 5 |
Scores | 4 |
Mathematics Tests | 3 |
Test Items | 3 |
Comparative Analysis | 2 |
Content Analysis | 2 |
Elementary Education | 2 |
Evaluators | 2 |
Item Analysis | 2 |
Reading Tests | 2 |
More ▼ |
Source
Applied Measurement in… | 9 |
Author
Publication Type
Journal Articles | 9 |
Reports - Research | 6 |
Reports - Evaluative | 3 |
Education Level
Grade 4 | 2 |
Grade 7 | 2 |
Higher Education | 2 |
Postsecondary Education | 2 |
Elementary Secondary Education | 1 |
Grade 10 | 1 |
Grade 3 | 1 |
Grade 6 | 1 |
Audience
Location
Connecticut | 1 |
Georgia | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Georgia Criterion Referenced… | 1 |
Metropolitan Achievement Tests | 1 |
Stanford Achievement Tests | 1 |
What Works Clearinghouse Rating
Walter P. Vispoel; Hyeri Hong; Hyeryung Lee; Terrence D. Jorgensen – Applied Measurement in Education, 2023
We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software ("lavaan" in R), compare results to those obtained from numerous ANOVA-based packages, and apply those results in practical ways using data obtained from a large sample of respondents, who completed the Self-Perception…
Descriptors: Generalizability Theory, Design, Structural Equation Models, Error of Measurement
Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015
By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…
Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2012
This was a study of differential item functioning (DIF) for grades 4, 7, and 10 reading and mathematics items from state criterion-referenced tests. The tests were composed of multiple-choice and constructed-response items. Gender DIF was investigated using POLYSIBTEST and a Rasch procedure. The Rasch procedure flagged more items for DIF than did…
Descriptors: Test Bias, Gender Differences, Reading Tests, Mathematics Tests
Engelhard, George, Jr.; Fincher, Melissa; Domaleski, Christopher S. – Applied Measurement in Education, 2011
This study examines the effects of two test administration accommodations on the mathematics performance of students within the context of a large-scale statewide assessment. The two test administration accommodations were resource guides and calculators. A stratified random sample of schools was selected to represent the demographic…
Descriptors: Testing Accommodations, Disabilities, High Stakes Tests, Program Effectiveness
Ferdous, Abdullah A.; Plake, Barbara S. – Applied Measurement in Education, 2005
This study addressed what standard-setting panelists think about when they make item performance estimates for a barely proficient student. This study extended previous studies by considering the factors that influenced panelists' decisions in an Angoff (1971)-based standard-setting study as a function of their item performance estimates.…
Descriptors: Test Items, Standard Setting (Scoring), Decision Making, Student Evaluation

Jaeger, Richard M. – Applied Measurement in Education, 1988
The modified caution index's use in identifying judges whose patterns of item judgment appear aberrant when compared with the pattern produced by the entire group (N=158) of judges was studied. Effects on test standards and passing rates of removing test standards of these judges were also assessed. (TJH)
Descriptors: Criterion Referenced Tests, Evaluators, Item Analysis, Mathematics Tests

Crone, Linda J.; And Others – Applied Measurement in Education, 1995
The feasibility of combining criterion-referenced and norm-referenced tests into a school achievement test score to be used for school effectiveness classification was studied with 361 elementary schools across 2 years and across a subsample of 264 schools. Results support combining scores of different tests to measure school effectiveness. (SLD)
Descriptors: Achievement Tests, Classification, Comparative Analysis, Criterion Referenced Tests

Behuniak, Peter; Tucker, Charlene – Applied Measurement in Education, 1992
Psychometrically linking a state criterion-referenced test (CRT) and a norm-referenced test (NRT) to yield NRT information through the CRT was studied with samples of 1,500 to 3,000 elementary school students per subject and grade level in Connecticut. A CRT/NRT link can create a focused and coherent assessment system. (SLD)
Descriptors: Content Analysis, Criterion Referenced Tests, Educational Assessment, Elementary Education

Linn, Robert L.; Hambleton, Ronald K. – Applied Measurement in Education, 1991
Four main approaches to customized testing are described, and their resulting scores' valid uses and interpretations are discussed. Customized testing can yield valid normative and curriculum-specific information, although cautious application is needed to avoid misleading inferences about student achievement. (SLD)
Descriptors: Academic Achievement, Accountability, Criterion Referenced Tests, Curriculum