ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	3
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	6

Descriptor

Test Reliability	15
Standard Setting (Scoring)	9
Test Validity	7
Cutting Scores	6
Standard Setting	6
Test Construction	6
Criterion Referenced Tests	5
Foreign Countries	5
Test Items	5
Error of Measurement	4
Factor Analysis	3
Higher Education	3
Interrater Reliability	3
Language Tests	3
Rating Scales	3
Reading Tests	3
Construct Validity	2
Difficulty Level	2
Goodness of Fit	2
High Stakes Tests	2
Inferences	2
Item Response Theory	2
Language Proficiency	2
Latent Trait Theory	2
Mastery Tests	2
More ▼

Source

American Annals of the Deaf	1
Cypriot Journal of…	1
English Language Teaching	1
Evaluation and the Health…	1
International Journal of…	1
Journal of Educational…	1
Journal of Psychoeducational…	1
Journal of Research in…	1
Language Testing	1
Structural Equation Modeling:…	1

Publication Type

Reports - Research	15
Journal Articles	10
Speeches/Meeting Papers	4
Opinion Papers	1

Education Level

Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
Grade 2	1
Grade 3	1
Grade 4	1
Grade 5	1
Higher Education	1
Intermediate Grades	1
Middle Schools	1
Postsecondary Education	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Location

Turkey	2
Europe	1
France	1
Nigeria	1
Thailand	1

Laws, Policies, & Programs

Assessments and Surveys

Alabama High School…	1
Praxis Series	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Direct Discrepancy Dynamic Fit Index Cutoffs for Arbitrary Covariance Structure Models

Peer reviewed

Direct link

Daniel McNeish; Melissa G. Wolf – Structural Equation Modeling: A Multidisciplinary Journal, 2024

Despite the popularity of traditional fit index cutoffs like RMSEA [less than or equal to] 0.06 and CFI [greater than or equal to] 0.95, several studies have noted issues with overgeneralizing traditional cutoffs. Computational methods have been proposed to avoid overgeneralization by deriving cutoffs specifically tailored to the characteristics…

Descriptors: Structural Equation Models, Cutting Scores, Generalizability Theory, Error of Measurement

The Riddle Knowledge Inference Test (R-Kit)

Peer reviewed

Direct link

Nicolas Rochat; Laurent Lima; Pascal Bressoux – Journal of Psychoeducational Assessment, 2025

Inference is considered an important factor in comprehension models and has been described as a causal factor in predicting comprehension. To date, specific tests for inference are rare and often rely on specific thematic texts. This reliance on thematic inference may raise some concerns as inference is related to prior text-specific knowledge.…

Descriptors: Inferences, Reading Comprehension, Reading Tests, Test Reliability

Setting Standards for a Diagnostic Test of Aviation English for Student Pilots

Peer reviewed

Direct link

Maria Treadaway; John Read – Language Testing, 2024

Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight…

Descriptors: Standard Setting, Diagnostic Tests, High Stakes Tests, English for Special Purposes

The Development of STEP, the CEFR-Based English Proficiency Test

Peer reviewed
PDF on ERIC

Download full text

Sridhanyarat, Kietnawin; Pathong, Supakarn; Suranakkharin, Todsapon; Ammaralikit, Amornrat – English Language Teaching, 2021

This study aimed at developing the Silpakorn Test of English Proficiency (STEP), in alignment with the Common European Framework of Reference for Languages (CEFR), and in accordance with the theoretical framework established by Alderson et al. (2006). Four major steps were involved in the test construction. First, English language lecturers who…

Descriptors: Language Tests, Language Proficiency, Second Language Learning, Second Language Instruction

Scale of Professional Ethics for Individuals Working in the Field of Special Education: Validity and Reliability Study

Peer reviewed
PDF on ERIC

Download full text

Akcamete, Gonul; Kayhan, Nilay; Yildirim, A. Emel Sardohan – Cypriot Journal of Educational Sciences, 2017

Professional ethics includes the principles set forth by professional associations and accepted as correct by discussions over time, and which has become the sine qua non of a profession today. Professional ethics are established to increase the quality of professional practices and ensure correct and honest conduct. Not having professional…

Descriptors: Ethics, Special Education, Special Education Teachers, Professional Associations

Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

Peer reviewed
PDF on ERIC

Download full text

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi – International Journal of Evaluation and Research in Education, 2016

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…

Descriptors: Item Response Theory, Test Items, Difficulty Level, Statistical Analysis

Errors of Measurement and Standard Setting in Mastery Testing.

Kane, Michael; Wilson, Jennifer – 1982

This paper evaluates the magnitude of the total error in estimates of the difference between an examinee's domain score and the cutoff score. An observed score based on a random sample of items from the domain, and an estimated cutoff score derived from a judgmental standard setting procedure are assumed. The work of Brennan and Lockwood (1980) is…

Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Mastery Tests

Standards and Criteria: A Response to Glass' Criticism of the Nedelsky Technique.

Peer reviewed

Gross, Leon J. – Journal of Educational Measurement, 1982

Addressing Glass' argument (EJ 198 842) that a lack of interrelater reliability is an inherent deficiency in the Nedelsky technique, poor rater training and the need for a group decision procedure are presented as standard setting problems. (CM)

Descriptors: Academic Standards, Criterion Referenced Tests, Cutting Scores, Evaluation Criteria

Setting, Evaluating, and Maintaining Certification Standards with the Rasch Model.

Peer reviewed

Grosse, Martin E.; Wright, Benjamin D. – Evaluation and the Health Professions, 1986

Based on the standard setting procedures or the American Board of Preventive Medicine for their Core Test, this article describes how Rasch measurement can facilitate using test content judgments in setting a standard. Rasch measurement can then be used to evaluate and improve the precision of the standard and to hold it constant across time.…

Descriptors: Certification, Criterion Referenced Tests, Difficulty Level, Health Personnel

Who Will Watch the Watchers? Setting Standards for Classroom Observers.

Download full text

Livingston, Samuel A.; Sims-Gunzenhauser, Alice – 1995

A study was conducted to provide information for setting two separate standards, the accuracy score and the documentation score, for the Praxis III: Classroom Performance Assessment (Praxis III). Praxis III is intended for making instructional and licensing decisions about beginning teachers. This standard-setting study was a person-judgment…

Descriptors: Beginning Teachers, Classroom Observation Techniques, Documentation, Elementary Secondary Education

Assessing Inconsistencies in Standard Setting with the Angoff or Nedelsky Technique.

Download full text

van der Linden, Wim J. – 1982

A latent trait method is presented to investigate the possibility that Angoff or Nedelsky judges specify inconsistent probabilities in standard setting techniques for objectives-based instructional programs. It is suggested that judges frequently specify a low probability of success for an easy item but a large probability for a hard item. The…

Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Interrater Reliability

Criterion-Referenced Tests in Science: An Investigation of Reliability, Validity, and Standards-Setting.

Peer reviewed

Lang, Harry G. – Journal of Research in Science Teaching, 1982

Reliability, validity, and standards-setting procedure for a criterion-referenced test (Test of Metric Skills) were examined for use in science curricula. Results indicate a number of factors influencing test reliability/validity and that science teachers need to be aware of these factors to enhance accuracy of their judgments. (Author/JN)

Descriptors: College Science, Criterion Referenced Tests, Higher Education, Science Education

Standard Setting Study of the UT Austin Test for Credit in Japanese: Fall 1991 through Spring 1993. Research Bulletin 93-2.

Download full text

Fitzpatrick, Steven J.; And Others – 1994

In 1991 the Measurement and Evaluation Center of the University of Texas at Austin was asked to develop a test for credit by examination in four lower division courses in Japanese. The test (in Japanese) was constructed from locally developed items provided by instructors of Japanese. The developed test consisted of 80 items distributed among…

Descriptors: College Students, Cutting Scores, Equivalency Tests, Higher Education

Sources of Variability in the Angoff Standard-Setting Process.

Download full text

Halpin, Glennelle; McLean, James E. – 1991

Although the standard-setting method of W. H. Angoff (1971) has broad-based support in the research literature, inconsistencies in the resulting standards do occur. Sources of these inconsistencies are examined in a study of judges, competencies (items), rounds (replications), and the interactions among them. A modified Angoff approach was used to…

Descriptors: Analysis of Variance, Error of Measurement, Evaluators, High Schools

The Turkish Standardization of the Meadow-Kendall Social-Emotional Assessment Inventory for Deaf and Hearing-Impaired Students

Peer reviewed

Direct link

Polat, Filiz – American Annals of the Deaf, 2006

The article present results of standardization of the Meadow-Kendall Social-Emotional Assessment Inventory for Deaf and Hearing-Impaired Students (Meadow, 1983), school-age version, for use in Turkey. The SEAI is a 59-item measure for assessing socioemotional adjustment of school-age deaf and hearing impaired students. A sample of 1,097 deaf…

Descriptors: Turkish, Deafness, Foreign Countries, Emotional Adjustment

Akcamete, Gonul	1
Ammaralikit, Amornrat	1
Bello, Samira Abdullahi	1
Bichi, Ado Abdu	1
Daniel McNeish	1
Fitzpatrick, Steven J.	1
Gross, Leon J.	1
Grosse, Martin E.	1
Hafiz, Hadiza	1
Halpin, Glennelle	1
John Read	1
Kane, Michael	1
Kayhan, Nilay	1
Lang, Harry G.	1
Laurent Lima	1
Livingston, Samuel A.	1
Maria Treadaway	1
McLean, James E.	1
Melissa G. Wolf	1
Nicolas Rochat	1
Pascal Bressoux	1
Pathong, Supakarn	1
Polat, Filiz	1
Sims-Gunzenhauser, Alice	1
More ▼