ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	3

Descriptor

Difficulty Level	13
Standards	13
Standard Setting (Scoring)	10
Test Items	8
Cutting Scores	6
Interrater Reliability	4
Higher Education	3
Item Analysis	3
Judges	3
Minimum Competency Testing	3
Standard Setting	3
Ability	2
Certification	2
Competence	2
Criterion Referenced Tests	2
Equated Scores	2
Evaluators	2
Item Response Theory	2
Knowledge Level	2
Latent Trait Theory	2
Multiple Choice Tests	2
Pass Fail Grading	2
Scoring	2
Achievement Tests	1
Advanced Placement Programs	1
More ▼

Source

Applied Measurement in…	2
College Board	1
Educational Research and…	1
Educational and Psychological…	1
Evaluation and the Health…	1
Journal of Psychoeducational…	1

Publication Type

Reports - Research	9
Journal Articles	6
Speeches/Meeting Papers	4
Reports - Evaluative	3
Non-Print Media	1
Reference Materials - General	1
Tests/Questionnaires	1

Education Level

Grade 8	1
Higher Education	1
Secondary Education	1

Audience

Researchers

Location

New Jersey	1
South Africa	1

Laws, Policies, & Programs

Education Consolidation…

Assessments and Surveys

Advanced Placement…	1
Motivated Strategies for…	1
National Assessment of…	1
Progress in International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Identifying Severity Standards on the Cognitive Test Anxiety Scale: Cut Score Determination Using Latent Class and Cluster Analysis

Peer reviewed

Direct link

Thomas, Christopher L.; Cassady, Jerrell C.; Finch, W. Holmes – Journal of Psychoeducational Assessment, 2018

The purpose of the current examination was to preliminarily suggest severity standards for the recently revised Cognitive Test Anxiety Scale-Second Edition (CTAS-2). Participants responded to the CTAS-2, Motivated Strategies for Learning Questionnaire (MSLQ), and FRIEDBEN Test Anxiety Scale. Using both latent class and cluster analyses, we were…

Descriptors: Cognitive Tests, Test Anxiety, Cutting Scores, Multivariate Analysis

Using the Many-Facet Rasch Model to Evaluate Standard-Setting Judgments: Setting Performance Standards for Advanced Placement® Examinations

Download full text

Kaliski, Pamela; Wind, Stefanie A.; Engelhard, George, Jr.; Morgan, Deanna; Plake, Barbara; Reshetar, Rosemary – College Board, 2012

The Many-Facet Rasch (MFR) Model is traditionally used to evaluate the quality of ratings on constructed response assessments; however, it can also be used to evaluate the quality of judgments from panel-based standard setting procedures. The current study illustrates the use of the MFR Model by examining the quality of ratings obtained from a…

Descriptors: Advanced Placement Programs, Achievement Tests, Item Response Theory, Models

Constructing Benchmarks for Monitoring Purposes: Evidence from South Africa

Peer reviewed

Direct link

Scherman, Vanessa; Howie, Sarah J.; Bosker, Roel J. – Educational Research and Evaluation, 2011

In information-rich environments, schools are often presented with a myriad of data from which decisions need to be made. The use of the information on a classroom level may be facilitated if performance could be described in terms of levels of proficiency or benchmarks. The aim of this article is to explore benchmarks using data from a monitoring…

Descriptors: Standard Setting, Foreign Countries, Grade 8, Ability

Establishing Upper Limits for Item Ratings for the Angoff Method: Are Resulting Standards More 'Realistic'?

Reid, Jerry B. – 1985

This report investigates an area of uncertainty in using the Angoff method for setting standards, namely whether or not a judge's conceptualizations of borderline group performance are realistic. Ratings are usually made with reference to the performance of this hypothetical group, therefore the Angoff method's success is dependent on this point.…

Descriptors: Certification, Cutting Scores, Difficulty Level, Interrater Reliability

Effects of Mode of Item Presentation on Standard Setting.

PDF pending restoration

Faggen, Jane; And Others – 1995

The objective of this study was to determine the degree to which recommendations for passing scores, calculated on the basis of a traditional standard-setting methodology, might be affected by the mode (paper versus computer-screen prints) in which test items were presented to standard setting panelists. Results were based on the judgments of 31…

Descriptors: Computer Assisted Testing, Cutting Scores, Difficulty Level, Evaluators

The Effect of Anchor Item Characteristics on Equivalent Cutting Scores.

Peer reviewed

Norcini, John; And Others – Applied Measurement in Education, 1994

Whether anchor item sets varying in difficulty and discrimination affect precision of cutting score equivalents generated through judge rescaling as much as equivalents from score equating was studied with 4 groups of experts and 250 and 1,000 examinees. Results indicate the robustness of judge rescaling and its superiority over equating. (SLD)

Descriptors: Cutting Scores, Decision Making, Difficulty Level, Equated Scores

Does a Standard Reflect Minimal Competency of Examinees or Judge Competency?

Peer reviewed

Chang, Lei; And Others – Applied Measurement in Education, 1996

The influence of judges' knowledge on standard setting for competency tests was studied with 17 judges who took an economics teacher certification test while setting competency standards using the Angoff procedure. Judges tended to set higher standards for items they answered correctly and lower standards for items they answered incorrectly. (SLD)

Descriptors: Competence, Difficulty Level, Economics, Judges

Grading Large Classes: An Application of Linear Equating to Percentage-Correct Grading Decisions.

Download full text

Johanson, George A.; Rich, Charles E. – 1991

Assigning letter grades in a consistent manner to tests in large classes across semesters is problematic if absolute grading standards are used. It may be unreasonable to implement the usual standard-setting approaches recommended for large-scale criterion-referenced testing due to both time constraints and a desire to have criteria that appear…

Descriptors: Class Size, College Students, Criterion Referenced Tests, Difficulty Level

Setting, Evaluating, and Maintaining Certification Standards with the Rasch Model.

Peer reviewed

Grosse, Martin E.; Wright, Benjamin D. – Evaluation and the Health Professions, 1986

Based on the standard setting procedures or the American Board of Preventive Medicine for their Core Test, this article describes how Rasch measurement can facilitate using test content judgments in setting a standard. Rasch measurement can then be used to evaluate and improve the precision of the standard and to hold it constant across time.…

Descriptors: Certification, Criterion Referenced Tests, Difficulty Level, Health Personnel

A Comparison between the Nedelsky and Angoff Standard-Setting Methods.

Download full text

Chang, Lei – 1996

It was hypothesized that, when compared to the Angoff method (W. H. Angoff, 1971), the Nedelsky method (L. Nedelsky, 1954) for standard setting had lower intrajudge inconsistency, lower cutscores, and lower cutscores especially for items presenting challenges to the judges. These hypotheses were tested and supported in a sample of 22 graduate…

Descriptors: Comparative Analysis, Cutting Scores, Difficulty Level, Distractors (Tests)

Construct Validation of Minimum Competence in Standard Setting. Revised.

Download full text

DeMauro, Gerald E. – 1995

Studies of the Angoff method of standard setting suggest that judges agree in their estimates of the relative difficulties of test questions for minimally competent examinees and that each judge's estimates correlate well with the observed item difficulties for examinees whose total test scores are near the judge's personal standard (G. E.…

Descriptors: Ability, Competence, Construct Validity, Difficulty Level

Effects of Item Context on Intrajudge Consistency of Expert Judgments via the Nedelsky Standard Setting Method.

Peer reviewed

Plake, Barbara S.; Melican, Gerald J. – Educational and Psychological Measurement, 1989

The impact of overall test length and difficulty on the expert judgments of item performance by the Nedelsky method were studied. Five university-level instructors predicting the performance of minimally competent candidates on a mathematics examination were fairly consistent in their assessments regardless of length or difficulty of the test.…

Descriptors: Difficulty Level, Estimation (Mathematics), Evaluators, Higher Education

A Procedure for Estimating a Criterion-Referenced Standard to Identify Educationally Deprived Children for Title I Services. Final Report.

Download full text

Ziomek, Robert L.; Wright, Benjamin D. – 1984

Techniques such as the norm-referenced and average score techniques, commonly used in the identification of educationally disadvantaged students, are critiqued. This study applied latent trait theory, specifically the Rasch Model, along with teacher judgments relative to the mastery of instructional/test decisions, to derive a standard setting…

Descriptors: Cutting Scores, Difficulty Level, Educationally Disadvantaged, Intermediate Grades

Chang, Lei	2
Wright, Benjamin D.	2
Bosker, Roel J.	1
Cassady, Jerrell C.	1
DeMauro, Gerald E.	1
Engelhard, George, Jr.	1
Faggen, Jane	1
Finch, W. Holmes	1
Grosse, Martin E.	1
Howie, Sarah J.	1
Johanson, George A.	1
Kaliski, Pamela	1
Melican, Gerald J.	1
Morgan, Deanna	1
Norcini, John	1
Plake, Barbara	1
Plake, Barbara S.	1
Reid, Jerry B.	1
Reshetar, Rosemary	1
Rich, Charles E.	1
Scherman, Vanessa	1
Thomas, Christopher L.	1
Wind, Stefanie A.	1
Ziomek, Robert L.	1
More ▼