Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 5 |
| Since 2007 (last 20 years) | 9 |
Descriptor
| Error of Measurement | 14 |
| Test Items | 14 |
| Cutting Scores | 11 |
| Standard Setting (Scoring) | 8 |
| Standard Setting | 6 |
| Difficulty Level | 5 |
| Generalizability Theory | 4 |
| Interrater Reliability | 4 |
| Item Response Theory | 4 |
| Sampling | 4 |
| Computation | 3 |
| More ▼ | |
Source
Author
| Arce, Alvaro J. | 1 |
| Babcock, Ben | 1 |
| Bramley, Tom | 1 |
| Cetin, Sevda | 1 |
| Cope, Ronald T. | 1 |
| Griph, Gerald W. | 1 |
| Kannan, Priya | 1 |
| Kara, Hakan | 1 |
| Katz, Irvin R. | 1 |
| Kolstad, Andrew | 1 |
| Oranje, Andreas | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 9 |
| Reports - Research | 8 |
| Reports - Descriptive | 3 |
| Speeches/Meeting Papers | 3 |
| Numerical/Quantitative Data | 2 |
| Reports - Evaluative | 2 |
| Opinion Papers | 1 |
Education Level
| Elementary Secondary Education | 3 |
| Grade 3 | 1 |
| Grade 5 | 1 |
| Grade 7 | 1 |
| Higher Education | 1 |
| Junior High Schools | 1 |
| Middle Schools | 1 |
| Postsecondary Education | 1 |
| Secondary Education | 1 |
Audience
| Researchers | 1 |
Location
| New Mexico | 2 |
| Europe | 1 |
| New Jersey | 1 |
| Thailand | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| National Assessment of… | 1 |
| Test of English as a Foreign… | 1 |
| Test of English for… | 1 |
What Works Clearinghouse Rating
Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020
A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…
Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items
Kara, Hakan; Cetin, Sevda – International Journal of Assessment Tools in Education, 2020
In this study, the efficiency of various random sampling methods to reduce the number of items rated by judges in an Angoff standard-setting study was examined and the methods were compared with each other. Firstly, the full-length test was formed by combining Placement Test 2012 and 2013 mathematics subsets. After then, simple random sampling…
Descriptors: Cutting Scores, Standard Setting (Scoring), Sampling, Error of Measurement
Bramley, Tom – Research Matters, 2020
The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…
Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy
Oranje, Andreas; Kolstad, Andrew – Journal of Educational and Behavioral Statistics, 2019
The design and psychometric methodology of the National Assessment of Educational Progress (NAEP) is constantly evolving to meet the changing interests and demands stemming from a rapidly shifting educational landscape. NAEP has been built on strong research foundations that include conducting extensive evaluations and comparisons before new…
Descriptors: National Competency Tests, Psychometrics, Statistical Analysis, Computation
Wudthayagorn, Jirada – LEARN Journal: Language Education and Acquisition Research Network, 2018
The purpose of this study was to map the Chulalongkorn University Test of English Proficiency, or the CU-TEP, to the Common European Framework of Reference (CEFR) by employing a standard setting methodology. Thirteen experts judged 120 items of the CU-TEP using the Yes/No Angoff technique. The experts decided whether or not a borderline student at…
Descriptors: Guidelines, Rating Scales, English (Second Language), Language Tests
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012
The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…
Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling
Yin, Ping; Sconing, James – Educational and Psychological Measurement, 2008
Standard-setting methods are widely used to determine cut scores on a test that examinees must meet for a certain performance standard. Because standard setting is a measurement procedure, it is important to evaluate variability of cut scores resulting from the standard-setting process. Generalizability theory is used in this study to estimate…
Descriptors: Generalizability Theory, Standard Setting, Cutting Scores, Test Items
Reckase, Mark D. – Educational Measurement: Issues and Practice, 2006
Schulz (2006) provides a different perspective on standard setting than that provided in Reckase (2006). He also suggests a modification to the bookmark procedure and some alternative models for errors in panelists' judgments than those provided by Reckase. This article provides a response to some of the points made by Schulz and reports some…
Descriptors: Evaluation Methods, Standard Setting, Reader Response, Regression (Statistics)
deGruijter, Dato N. M. – 1980
The setting of standards involves subjective value judgments. The inherent arbitrariness of specific standards has been severely criticized by Glass. His antagonists agree that standard setting is a judgmental task but they have pointed out that arbitrariness in the positive sense of serious judgmental decisions is unavoidable. Further, small…
Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests
Cope, Ronald T. – 1987
This study used generalizability theory and other statistical concepts to assess the application of the Angoff method to setting cutoff scores on two professional certification tests. A panel of ten judges gave pre- and post-feedback Angoff probability ratings of items of two forms of a professional certification test, and another panel of nine…
Descriptors: Certification, Correlation, Cutting Scores, Error of Measurement
van der Linden, Wim J. – 1982
A latent trait method is presented to investigate the possibility that Angoff or Nedelsky judges specify inconsistent probabilities in standard setting techniques for objectives-based instructional programs. It is suggested that judges frequently specify a low probability of success for an easy item but a large probability for a hard item. The…
Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Interrater Reliability
New Mexico Public Education Department, 2007
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2007 NMSBA. The 2007 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Summary of student performance; (4) Statistical analyses of item and…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring
Griph, Gerald W. – New Mexico Public Education Department, 2006
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring

Peer reviewed
Direct link
