Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 9 |
| Since 2017 (last 10 years) | 38 |
| Since 2007 (last 20 years) | 79 |
Descriptor
Source
Author
Publication Type
Education Level
Location
| Nebraska | 4 |
| Tennessee | 4 |
| Europe | 3 |
| Thailand | 3 |
| Germany | 2 |
| Netherlands | 2 |
| New Jersey | 2 |
| New Mexico | 2 |
| Nigeria | 2 |
| Turkey | 2 |
| United Kingdom | 2 |
| More ▼ | |
Laws, Policies, & Programs
| Comprehensive Education… | 2 |
| No Child Left Behind Act 2001 | 2 |
| Education Consolidation… | 1 |
| Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Yun-Kyung Kim; Li Cai – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2025
This paper introduces an application of cross-classified item response theory (IRT) modeling to an assessment utilizing the embedded standard setting (ESS) method (Lewis & Cook). The cross-classified IRT model is used to treat both item and person effects as random, where the item effects are regressed on the target performance levels (target…
Descriptors: Standard Setting (Scoring), Item Response Theory, Test Items, Difficulty Level
Daniel Lewis; Melanie Graw; Michael Baker – Journal of Applied Testing Technology, 2024
Embedded Standard Setting (ESS; Lewis & Cook, 2020) transforms standard setting from a standalone workshop to an active part of the assessment development lifecycle. ESS purports to lower costs by eliminating the standard-setting workshop and enhance the validity argument by maintaining a consistent focus on the evidentiary relationship…
Descriptors: Standard Setting (Scoring), Test Items, Test Construction, Food Service
Tia M. Fechter; Heeyeon Yoon – Language Testing, 2024
This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent…
Descriptors: Standard Setting, Language Proficiency, Language Tests, Evaluation Methods
Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022
Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…
Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis
Skaggs, Gary; Hein, Serge F.; Wilkins, Jesse L. M. – Educational Measurement: Issues and Practice, 2020
In test-centered standard-setting methods, borderline performance can be represented by many different profiles of strengths and weaknesses. As a result, asking panelists to estimate item or test performance for a hypothetical group study of borderline examinees, or a typical borderline examinee, may be an extremely difficult task and one that can…
Descriptors: Standard Setting (Scoring), Cutting Scores, Testing Problems, Profiles
Russell, Michael; Moncaleano, Sebastian – Practical Assessment, Research & Evaluation, 2020
Although both content alignment and standard-setting procedures rely on content-expert panel judgements, only the latter employs discussion among panel members. This study employed a modified form of the Webb methodology to examine content alignment for twelve tests administered as part of the Massachusetts Comprehensive Assessment System (MCAS).…
Descriptors: Test Content, Test Items, Discussion, Test Validity
Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2019
One common phenomenon in Angoff standard setting is that panelists regress their ratings in toward the middle of the probability scale. This study describes two indices based on taking ratios of standard deviations that can be utilized with a scatterplot of item ratings versus expected probabilities of success to identify whether ratings are…
Descriptors: Item Analysis, Standard Setting, Probability, Feedback (Response)
Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020
A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…
Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items
Lewis, Daniel; Cook, Robert – Educational Measurement: Issues and Practice, 2020
In this paper we assert that the practice of principled assessment design renders traditional standard-setting methodology redundant at best and contradictory at worst. We describe the rationale for, and methodological details of, Embedded Standard Setting (ESS; previously, Engineered Cut Scores. Lewis, 2016), an approach to establish performance…
Descriptors: Standard Setting, Evaluation, Cutting Scores, Performance Based Assessment
Rümeysa Kaya; Bayram Çetin – International Journal of Assessment Tools in Education, 2025
In this study, the cut-off scores obtained from the Angoff, Angoff Y/N, Nedelsky and Ebel standard methods were compared with the 50 T score and the current cut-off score in various aspects. Data were collected from 448 students who took Module B1+ English Exit Exam IV and 14 experts. It was seen that while the Nedelsky method gave the lowest…
Descriptors: Standard Setting, Cutting Scores, Exit Examinations, Academic Achievement
Sivakorn Tangsakul; Kornwipa Poonpon – rEFLections, 2024
Given the significant global influence of the Common European Framework of Reference for Languages: Teaching, Learning, and Assessment (CEFR) on English language education, this study deals with aligning a university's academic reading tests to the CEFR. It aimed at validating the test construct of the academic reading tests in relation to the…
Descriptors: Alignment (Education), Reading Tests, Second Language Learning, Language Proficiency
Kara, Hakan; Cetin, Sevda – International Journal of Assessment Tools in Education, 2020
In this study, the efficiency of various random sampling methods to reduce the number of items rated by judges in an Angoff standard-setting study was examined and the methods were compared with each other. Firstly, the full-length test was formed by combining Placement Test 2012 and 2013 mathematics subsets. After then, simple random sampling…
Descriptors: Cutting Scores, Standard Setting (Scoring), Sampling, Error of Measurement
Bramley, Tom – Research Matters, 2020
The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…
Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy
Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…
Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators
New Meridian Corporation, 2020
New Meridian Corporation has developed the "Quality Testing Standards and Criteria for Comparability Claims" (QTS). The goal of the QTS is to provide guidance to states that are interested in including content from the New Meridian item bank and intend to make comparability claims with "other assessments" that include New…
Descriptors: Testing, Standards, Comparative Analysis, Guidelines

Peer reviewed
Direct link
