ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	6

Descriptor

Interrater Reliability	12
Statistical Analysis	12
Test Items	12
Difficulty Level	5
Correlation	4
Higher Education	4
Scores	4
English (Second Language)	3
Item Analysis	3
Language Tests	3
Reliability	3
Test Validity	3
Testing	3
Accuracy	2
Cloze Procedure	2
Comparative Analysis	2
Computer Assisted Testing	2
Error of Measurement	2
Essay Tests	2
Evaluative Thinking	2
Foreign Countries	2
Judges	2
Language Proficiency	2
Multivariate Analysis	2
Psychometrics	2
More ▼

Source

Educational and Psychological…	2
Applied Measurement in…	1
Cambridge Assessment	1
ETS Research Report Series	1
International Journal of…	1
Language Assessment Quarterly	1
Research & Practice in…	1
Thought Currents in English…	1

Publication Type

Reports - Research	9
Journal Articles	8
Speeches/Meeting Papers	2
Books	1
Collected Works - General	1
Reports - Descriptive	1
Reports - Evaluative	1

Education Level

Higher Education	3
Postsecondary Education	1

Audience

Practitioners	1
Researchers	1
Teachers	1

Location

Japan	1
Tennessee	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
ACT Assessment	1
SAT (College Admission Test)	1
Test of English for…	1

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

A Multidisciplinary Assessment of Faculty Accuracy and Reliability with Bloom's Taxonomy

Peer reviewed
PDF on ERIC

Download full text

Welch, Adam C.; Karpen, Samuel C.; Cross, L. Brian; LeBlanc, Brandie N. – Research & Practice in Assessment, 2017

The aims of this study were to determine faculty's ability to accurately and reliably categorize exam questions using Bloom's Taxonomy, and if modified versions would improve the accuracy and reliability. Faculty experience and affiliation with a health sciences discipline were also considered. Faculty at one university were asked to categorize 30…

Descriptors: College Faculty, Medical School Faculty, Health Sciences, Test Items

Development and Validation of the Written Communication Assessment of the "HEIghten"® Outcomes Assessment Suite. Research Report. ETS RR-17-53

Peer reviewed
PDF on ERIC

Download full text

Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017

Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…

Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment

Peer reviewed
PDF on ERIC

Download full text

Kalender, Ilker – International Journal of Higher Education, 2015

Student evaluations of teaching (SET) have been the principal instrument to elicit students' opinions in higher education institutions. Many decisions, including high-stake ones, are made based on SET scores reported by students. In this respect, reliability of SET scores is of considerable importance. This paper has an argument that there are…

Descriptors: Higher Education, Reliability, Test Items, Measurement

The Role of Lexical Properties and Cohesive Devices in Text Integration and Their Effect on Human Ratings of Speaking Proficiency

Peer reviewed

Direct link

Crossley, Scott; Clevinger, Amanda; Kim, YouJin – Language Assessment Quarterly, 2014

There has been a growing interest in the use of integrated tasks in the field of second language testing to enhance the authenticity of language tests. However, the role of text integration in test takers' performance has not been widely investigated. The purpose of the current study is to examine the effects of text-based relational (i.e.,…

Descriptors: Language Proficiency, Connected Discourse, Language Tests, English (Second Language)

The Paired Comparison Method in Educational Research.

Peer reviewed

Green, Kathy – Educational and Psychological Measurement, 1985

Five sets of paired comparison judgments were made concerning test item difficulty, in order to identify the most probable source of intrasensitivity in the data. The paired comparisons method was useful in providing information about sensitivity to stimulus differences, but less useful for assessing dimensionality of judgment criteria.…

Descriptors: Adults, Difficulty Level, Evaluative Thinking, Higher Education

A Generalizability Study of the Angoff Method Applied to Setting Cutoff Scores of Professional Certification Tests.

Cope, Ronald T. – 1987

This study used generalizability theory and other statistical concepts to assess the application of the Angoff method to setting cutoff scores on two professional certification tests. A panel of ten judges gave pre- and post-feedback Angoff probability ratings of items of two forms of a professional certification test, and another panel of nine…

Descriptors: Certification, Correlation, Cutting Scores, Error of Measurement

Testing the Difference between Two Alpha Coefficients with Small Samples of Subjects and Raters

Peer reviewed

Direct link

Feldt, Leonard S.; Kim, Seonghoon – Educational and Psychological Measurement, 2006

Researchers sometimes need a statistical test of the hypothesis that two values of Cronbach's alpha reliability coefficient are equal. The situation may involve scores from two different measures administered to independent random samples or from the same measure administered to random samples from two different populations. Feldt derived a test…

Descriptors: Individual Testing, Test Items, Sample Size, Scores

Generalizability Theory in Program Evaluation.

Rothman, M. L.; And Others – 1982

A practical application of generalizability theory, demonstrating how the variance components contribute to understanding and interpreting the data collected to evaluate a program, is described. The evaluation concerned 120 learning modules developed for the Dental Auxiliary Education Project. The goals of the project were to design, implement,…

Descriptors: Correlation, Data Collection, Dental Schools, Educational Research

A Survey of Issues and Item Writing in Language Testing.

Download full text

Strong, Gregory – Thought Currents in English Literature, 1995

This paper traces developments in educational psychology and measurement that led to the Test of English as a Foreign Language (TOEFL) and the test of English for International Communication (TOEIC) and the application of educational measurement terms such as validity and reliability to testing. Use of a table of specifications for planning…

Descriptors: Cloze Procedure, Difficulty Level, English (Second Language), Foreign Countries

Technology and Language Testing. A Collection of Papers from the Annual Colloquium on Language Testing Research (7th, Princeton, New Jersey, April 6-9, 1985).

Stansfield, Charles W., Ed. – 1986

This collection of essays on measurement theory and language testing includes: "Computerized Adaptive Testing: Implications for Language Test Developers" (Peter Tung); "The Promise and Threat of Computerized Adaptive Assessment of Reading Comprehension" (Michael Canale); "Computerized Rasch Analysis of Item Bias in ESL…

Descriptors: Chinese, Cloze Procedure, Computer Assisted Testing, Computer Software

Benton, Tom	1
Boyer, Michelle	1
Clevinger, Amanda	1
Cope, Ronald T.	1
Cross, L. Brian	1
Crossley, Scott	1
Feldt, Leonard S.	1
Green, Kathy	1
Hughes, Sarah	1
Kalender, Ilker	1
Karpen, Samuel C.	1
Kieftenbeld, Vincent	1
Kim, Seonghoon	1
Kim, YouJin	1
LeBlanc, Brandie N.	1
Leech, Tony	1
Liu, Ou Lydia	1
Rios, Joseph A.	1
Rothman, M. L.	1
Sparks, Jesse R.	1
Stansfield, Charles W., Ed.	1
Strong, Gregory	1
Welch, Adam C.	1
Zhang, Mo	1
More ▼