ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	2
Since 2007 (last 20 years)	5

Descriptor

Difficulty Level	14
Interrater Reliability	14
Scoring	14
Test Items	6
Evaluators	5
Higher Education	5
Standard Setting (Scoring)	4
Estimation (Mathematics)	3
Judges	3
Minimum Competency Testing	3
Scores	3
Test Interpretation	3
Comparative Analysis	2
Computer Assisted Testing	2
Educational Quality	2
Elementary Secondary Education	2
Generalizability Theory	2
High School Students	2
High Stakes Tests	2
Language Tests	2
Licensing Examinations…	2
Mathematics Tests	2
Minimum Competencies	2
Performance Based Assessment	2
Rating Scales	2
More ▼

Source

Educational Measurement:…	2
Educational and Psychological…	2
Applied Measurement in…	1
Education Digest: Essential…	1
Education Sciences	1
European Journal of Science…	1
High School Journal	1

Publication Type

Journal Articles	9
Reports - Research	7
Reports - Evaluative	6
Speeches/Meeting Papers	4
Tests/Questionnaires	2
Reports - Descriptive	1

Education Level

High Schools	2
Higher Education	2
Postsecondary Education	2
Secondary Education	2
Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
Grade 10	1
Kindergarten	1
Primary Education	1

Audience

Researchers

Location

California	2
Florida	1
Tennessee	1

Laws, Policies, & Programs

Assessments and Surveys

edTPA (Teacher Performance…

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Establishing a Physics Concept Inventory Using Computer Marked Free-Response Questions

Peer reviewed
PDF on ERIC

Download full text

Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023

The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…

Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability

Low Inter-Rater Reliability of a High Stakes Performance Assessment of Teacher Candidates

Peer reviewed
PDF on ERIC

Download full text

Lyness, Scott A.; Peterson, Kent; Yates, Kenneth – Education Sciences, 2021

The Performance Assessment for California Teachers (PACT) is a high stakes summative assessment that was designed to measure pre-service teacher readiness. We examined the inter-rater reliability (IRR) of trained PACT evaluators who rated 19 candidates. As measured by Cohen's weighted kappa, the overall IRR estimate was 0.17 (poor strength of…

Descriptors: High Stakes Tests, Performance Based Assessment, Teacher Effectiveness, Academic Language

Engagement, Alignment, and Rigor as Vital Signs of High-Quality Instruction: A Classroom Visit Protocol for Instructional Improvement and Research

Peer reviewed

Direct link

Early, Diane M.; Rogge, Ronald D.; Deci, Edward L. – High School Journal, 2014

This paper investigates engagement (E), alignment (A), and rigor (R) as vital signs of high-quality teacher instruction as measured by the EAR Classroom Visit Protocol, designed by the Institute for Research and Reform in Education (IRRE). Findings indicated that both school leaders and outside raters could learn to score the protocol with…

Descriptors: Educational Quality, Learner Engagement, Alignment (Education), Difficulty Level

Quality of Questions on Common Tests at Issue

Direct link

Sawchuk, Stephen – Education Digest: Essential Readings Condensed for Quick Review, 2010

Most experts in the testing community have presumed that the $350 million promised by the U.S. Department of Education to support common assessments would promote those that made greater use of open-ended items capable of measuring higher-order critical-thinking skills. But as measurement experts consider the multitude of possibilities for an…

Descriptors: Educational Quality, Test Items, Comparative Analysis, Multiple Choice Tests

Magnitude of Task-Sampling Variability in Performance Assessment: A Meta-Analysis

Peer reviewed

Direct link

Huang, Chiungjung – Educational and Psychological Measurement, 2009

This study examined the percentage of task-sampling variability in performance assessment via a meta-analysis. In total, 50 studies containing 130 independent data sets were analyzed. Overall results indicate that the percentage of variance for (a) differential difficulty of task was roughly 12% and (b) examinee's differential performance of the…

Descriptors: Test Bias, Research Design, Performance Based Assessment, Performance Tests

Measuring the Impact of Judge Severity on Examination Scores.

Peer reviewed

Lunz, Mary E.; And Others – Applied Measurement in Education, 1990

An extension of the Rasch model is used to obtain objective measurements for examinations graded by judges. The model calibrates elements of each facet of the examination on a common log-linear scale. Real examination data illustrate the way correcting for judge severity improves fairness of examinee measures. (SLD)

Descriptors: Certification, Difficulty Level, Interrater Reliability, Judges

Generalizability, Validity, and Examinee Perceptions of a Computer-Delivered Formulating-Hypotheses Test. GRE Board Professional Report No. 90-02aP.

Download full text

Bennett, Randy Elliot; Rock, Donald A. – 1993

Formulating-Hypotheses (F-H) items present a situation and ask the examinee to generate as many explanations for it as possible. This study examined the generalizability, validity, and examinee perceptions of a computer-delivered version of the task. Eight F-H questions were administered to 192 graduate students. Half of the items restricted…

Descriptors: Computer Assisted Testing, Difficulty Level, Generalizability Theory, Graduate Students

Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.

Download full text

O'Neill, Thomas R.; Lunz, Mary E. – 1996

To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…

Descriptors: Ability, Benchmarking, Comparative Analysis, Difficulty Level

Defining Minimal Competence.

Peer reviewed

Mills, Craig N.; And Others – Educational Measurement: Issues and Practice, 1991

An approach is presented to the definition of minimal competence for judges to use in standard setting. Panelists in standard setting must receive training to ensure that differences in rating result from differences in perceptions of item difficulty, not in differences of opinion about the definition of minimal competence. (SLD)

Descriptors: Cutting Scores, Decision Making, Definitions, Difficulty Level

Training Judges to Generate Standard-Setting Data.

Peer reviewed

Reid, Jerry B. – Educational Measurement: Issues and Practice, 1991

Training judges to generate item ratings in standard setting once the reference group has been defined is discussed. It is proposed that sensitivity to the factors that determine difficulty can be improved through training. Three criteria for determining when training is sufficient are offered. (SLD)

Descriptors: Computer Assisted Instruction, Difficulty Level, Evaluators, Interrater Reliability

Grading Reading Passages According to the ACTFL/ETS/ILR Reading Proficiency Standard: Can It Be Learned?

Download full text

Lange, Dale L.; Lowe, Pardee, Jr. – 1987

A study investigated the use of reading proficiency scales developed by the American Council on the Teaching of Foreign Languages (ACTFL), Educational Testing Service (ETS), and Interagency Language Roundtable (ILR) for meaningful rank-ordering and assigning levels of second language competence to reading passages. In a proficiency test writing…

Descriptors: College Entrance Examinations, Difficulty Level, Higher Education, Interrater Reliability

Essay Topic Difficulty in Relation to Scoring Models.

Dovell, Patricia; Buhr, Dianne C. – 1986

This study examined the difficulty level of essay topics used in the large-scale assessment of writing in relation to five different scoring models, and sought to determine what effects the scoring models would have on passing rates. In model one, examinee's score is the direct result of a score assigned by the reader or the sum of scores assigned…

Descriptors: College Students, Difficulty Level, Essay Tests, Essays

Effects of Item Context on Intrajudge Consistency of Expert Judgments via the Nedelsky Standard Setting Method.

Peer reviewed

Plake, Barbara S.; Melican, Gerald J. – Educational and Psychological Measurement, 1989

The impact of overall test length and difficulty on the expert judgments of item performance by the Nedelsky method were studied. Five university-level instructors predicting the performance of minimally competent candidates on a mathematics examination were fairly consistent in their assessments regardless of length or difficulty of the test.…

Descriptors: Difficulty Level, Estimation (Mathematics), Evaluators, Higher Education

The Relationship between Modified Angoff Knowledge Estimation Judgments and Item Difficulty Values for Seven NTE Specialty Area Tests.

Wheeler, Patricia – 1991

The appropriateness of the Angoff method (W. H. Angoff, 1971) for setting standards on tests was studied. Evaluators (judges) from California school districts and teacher training institutions reviewed 15 NTE (National Teacher Examinations) Program Specialty Area Tests published by the Educational Testing Service for their appropriateness in…

Descriptors: Art Education, Biology, Difficulty Level, Elementary Secondary Education

Lunz, Mary E.	2
Bennett, Randy Elliot	1
Braithwaite, Nicholas St. J.	1
Buhr, Dianne C.	1
Deci, Edward L.	1
Dovell, Patricia	1
Early, Diane M.	1
Hedgeland, Holly	1
Huang, Chiungjung	1
Jordan, Sally E.	1
Lange, Dale L.	1
Lowe, Pardee, Jr.	1
Lyness, Scott A.	1
Melican, Gerald J.	1
Mills, Craig N.	1
O'Neill, Thomas R.	1
Parker, Mark A. J.	1
Peterson, Kent	1
Plake, Barbara S.	1
Reid, Jerry B.	1
Rock, Donald A.	1
Rogge, Ronald D.	1
Sawchuk, Stephen	1
Wheeler, Patricia	1
More ▼