Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 4 |
| Since 2007 (last 20 years) | 6 |
Descriptor
| Interrater Reliability | 12 |
| Statistical Analysis | 12 |
| Test Items | 12 |
| Difficulty Level | 5 |
| Correlation | 4 |
| Higher Education | 4 |
| Scores | 4 |
| English (Second Language) | 3 |
| Item Analysis | 3 |
| Language Tests | 3 |
| Reliability | 3 |
| More ▼ | |
Source
Author
| Benton, Tom | 1 |
| Boyer, Michelle | 1 |
| Clevinger, Amanda | 1 |
| Cope, Ronald T. | 1 |
| Cross, L. Brian | 1 |
| Crossley, Scott | 1 |
| Feldt, Leonard S. | 1 |
| Green, Kathy | 1 |
| Hughes, Sarah | 1 |
| Kalender, Ilker | 1 |
| Karpen, Samuel C. | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 9 |
| Journal Articles | 8 |
| Speeches/Meeting Papers | 2 |
| Books | 1 |
| Collected Works - General | 1 |
| Reports - Descriptive | 1 |
| Reports - Evaluative | 1 |
Education Level
| Higher Education | 3 |
| Postsecondary Education | 1 |
Audience
| Practitioners | 1 |
| Researchers | 1 |
| Teachers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 2 |
| ACT Assessment | 1 |
| SAT (College Admission Test) | 1 |
| Test of English for… | 1 |
What Works Clearinghouse Rating
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Welch, Adam C.; Karpen, Samuel C.; Cross, L. Brian; LeBlanc, Brandie N. – Research & Practice in Assessment, 2017
The aims of this study were to determine faculty's ability to accurately and reliably categorize exam questions using Bloom's Taxonomy, and if modified versions would improve the accuracy and reliability. Faculty experience and affiliation with a health sciences discipline were also considered. Faculty at one university were asked to categorize 30…
Descriptors: College Faculty, Medical School Faculty, Health Sciences, Test Items
Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017
Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…
Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment
Kalender, Ilker – International Journal of Higher Education, 2015
Student evaluations of teaching (SET) have been the principal instrument to elicit students' opinions in higher education institutions. Many decisions, including high-stake ones, are made based on SET scores reported by students. In this respect, reliability of SET scores is of considerable importance. This paper has an argument that there are…
Descriptors: Higher Education, Reliability, Test Items, Measurement
Crossley, Scott; Clevinger, Amanda; Kim, YouJin – Language Assessment Quarterly, 2014
There has been a growing interest in the use of integrated tasks in the field of second language testing to enhance the authenticity of language tests. However, the role of text integration in test takers' performance has not been widely investigated. The purpose of the current study is to examine the effects of text-based relational (i.e.,…
Descriptors: Language Proficiency, Connected Discourse, Language Tests, English (Second Language)
Peer reviewedGreen, Kathy – Educational and Psychological Measurement, 1985
Five sets of paired comparison judgments were made concerning test item difficulty, in order to identify the most probable source of intrasensitivity in the data. The paired comparisons method was useful in providing information about sensitivity to stimulus differences, but less useful for assessing dimensionality of judgment criteria.…
Descriptors: Adults, Difficulty Level, Evaluative Thinking, Higher Education
Cope, Ronald T. – 1987
This study used generalizability theory and other statistical concepts to assess the application of the Angoff method to setting cutoff scores on two professional certification tests. A panel of ten judges gave pre- and post-feedback Angoff probability ratings of items of two forms of a professional certification test, and another panel of nine…
Descriptors: Certification, Correlation, Cutting Scores, Error of Measurement
Feldt, Leonard S.; Kim, Seonghoon – Educational and Psychological Measurement, 2006
Researchers sometimes need a statistical test of the hypothesis that two values of Cronbach's alpha reliability coefficient are equal. The situation may involve scores from two different measures administered to independent random samples or from the same measure administered to random samples from two different populations. Feldt derived a test…
Descriptors: Individual Testing, Test Items, Sample Size, Scores
Rothman, M. L.; And Others – 1982
A practical application of generalizability theory, demonstrating how the variance components contribute to understanding and interpreting the data collected to evaluate a program, is described. The evaluation concerned 120 learning modules developed for the Dental Auxiliary Education Project. The goals of the project were to design, implement,…
Descriptors: Correlation, Data Collection, Dental Schools, Educational Research
Strong, Gregory – Thought Currents in English Literature, 1995
This paper traces developments in educational psychology and measurement that led to the Test of English as a Foreign Language (TOEFL) and the test of English for International Communication (TOEIC) and the application of educational measurement terms such as validity and reliability to testing. Use of a table of specifications for planning…
Descriptors: Cloze Procedure, Difficulty Level, English (Second Language), Foreign Countries
Stansfield, Charles W., Ed. – 1986
This collection of essays on measurement theory and language testing includes: "Computerized Adaptive Testing: Implications for Language Test Developers" (Peter Tung); "The Promise and Threat of Computerized Adaptive Assessment of Reading Comprehension" (Michael Canale); "Computerized Rasch Analysis of Item Bias in ESL…
Descriptors: Chinese, Cloze Procedure, Computer Assisted Testing, Computer Software

Direct link
