ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	6
Since 2017 (last 10 years)	13
Since 2007 (last 20 years)	27

Descriptor

Scores	38
Test Reliability	38
Tests	38
Test Validity	20
Correlation	14
Foreign Countries	12
Evaluation Methods	10
Factor Analysis	9
Item Response Theory	9
Test Construction	9
Item Analysis	7
Psychometrics	7
Statistical Analysis	7
Comparative Analysis	6
Predictor Variables	6
Scoring	6
Testing	6
Academic Achievement	5
Construct Validity	5
Student Evaluation	5
Test Items	5
College Students	4
Criterion Referenced Tests	4
Measurement	4
Models	4
More ▼

Publication Type

Reports - Research	23
Journal Articles	22
Dissertations/Theses -…	2
Guides - Non-Classroom	2
Reports - Descriptive	2
Reports - Evaluative	2
Collected Works - Proceedings	1
Guides - General	1
Reference Materials -…	1
Speeches/Meeting Papers	1

Education Level

Higher Education	13
Postsecondary Education	11
Elementary Education	2
Secondary Education	2
Elementary Secondary Education	1
High Schools	1
Kindergarten	1

Audience

Administrators	1
Practitioners	1

Location

United Kingdom	3
United Kingdom (England)	3
Australia	2
Germany	2
Netherlands	2
Taiwan	2
Turkey	2
Asia	1
Brazil	1
Colombia	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
Greece	1
Hawaii	1
Indonesia	1
Ireland	1
Israel	1
Italy	1
Japan	1
Jordan	1
Kazakhstan	1
Norway	1
More ▼

Laws, Policies, & Programs

Elementary and Secondary…

Assessments and Surveys

Early Childhood Longitudinal…	1
Law School Admission Test	1

What Works Clearinghouse Rating

Showing 1 to 15 of 38 results Save | Export

The Sensitivity of Value-Added Estimates to Test Scoring Decisions. EdWorkingPaper No. 25-1226

Download full text

Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025

Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…

Descriptors: Value Added Models, Tests, Testing, Scoring

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Exploring Psychometric Properties and Determinants of PLAAFP Quality Scores

Direct link

Christopher M. Claude – ProQuest LLC, 2024

This dissertation comprises three complementary studies that aim to advance the understanding and practice of Individualized Education Programs (IEP) and Present Levels of Academic Achievement and Functional Performance (PLAAFP) development in special education. In the first study, we systematically reviewed empirical research measuring IEP…

Descriptors: Individualized Education Programs, Academic Achievement, Special Education, Measurement

Conditional Precision of Measurement for Test Scores: Are Conditional Standard Errors Sufficient?

Peer reviewed

Direct link

Nicewander, W. Alan – Educational and Psychological Measurement, 2019

This inquiry is focused on three indicators of the precision of measurement--conditional on fixed values of ?, the latent variable of item response theory (IRT). The indicators that are compared are (1) The traditional, conditional standard errors, s(eX|?) = CSEM; (2) the IRT-based conditional standard errors, s[subscript irt](eX|?)=C[subscript…

Descriptors: Measurement, Accuracy, Scores, Error of Measurement

Valid and Reliable Assessments. CSAI Update

Download full text

Center on Standards and Assessments Implementation, 2018

Reliability is a measure of consistency. It is the degree to which student results are the same when they take the same test on different occasions, when different scorers score the same item or task, and when different but equivalent tests are taken at the same time or at different times. Reliability is about making sure that different test forms…

Descriptors: Test Reliability, Test Validity, Student Evaluation, Test Bias

Benthik Android Physics Comic Effectiveness for Vector Representation and Crtitical Thinking Students' Improvement

Peer reviewed
PDF on ERIC

Download full text

Maghfiroh, Anissa; Kuswanto, Heru – International Journal of Instruction, 2022

This research aims to reveal the effectiveness of the use of Kofie GeBoL media in improving (1) vector representation ability and (2) critical thinking ability in physics instruction. It is a descriptive quantitative study with the quasi-experiment design. It was conducted in two stages: empirical try out and implementation of Kofie GeboL to see…

Descriptors: Physics, Instructional Effectiveness, Critical Thinking, Thinking Skills

Impact of Both Local Item Dependencies and Cut-Point Locations on Examinee Classifications

Peer reviewed

Direct link

Rubright, Jonathan D. – Educational Measurement: Issues and Practice, 2018

Performance assessments, scenario-based tasks, and other groups of items carry a risk of violating the local item independence assumption made by unidimensional item response theory (IRT) models. Previous studies have identified negative impacts of ignoring such violations, most notably inflated reliability estimates. Still, the influence of this…

Descriptors: Performance Based Assessment, Item Response Theory, Models, Test Reliability

The Coding Stages Assessment: Development and Validation of an Instrument for Assessing Young Children's Proficiency in the Scratchjr Programming Language

Peer reviewed

Direct link

de Ruiter, Laura E.; Bers, Marina U. – Computer Science Education, 2022

Background and Context: Despite the increasing implementation of coding in early curricula, there are few valid and reliable assessments of coding abilities for young children. This impedes studying learning outcomes and the development and evaluation of curricula. Objective: Developing and validating a new instrument for assessing young…

Descriptors: Programming Languages, Computer Software, Coding, Computer Science Education

Further Study of the Choice of Anchor Tests in Equating

Peer reviewed

Direct link

Trierweiler, Tammy J.; Lewis, Charles; Smith, Robert L. – Journal of Educational Measurement, 2016

In this study, we describe what factors influence the observed score correlation between an (external) anchor test and a total test. We show that the anchor to full-test observed score correlation is based on two components: the true score correlation between the anchor and total test, and the reliability of the anchor test. Findings using an…

Descriptors: Scores, Correlation, Tests, Test Reliability

Multidimensional Balance in Youth with Visual Impairments

Direct link

Pennell, Adam – ProQuest LLC, 2019

This dissertation consists of three studies which examined multidimensional balance in youth (= 21 years; Individuals with Disabilities Education Act, 2004) with visual impairments (VIs) using the Brief-Balance Evaluation Systems Test (Brief-BESTest). These studies have the potential to inform (adapted) physical education curricula and…

Descriptors: Psychomotor Skills, Youth, Visual Impairments, Human Posture

Adaptation and Validation of a Test of Ethical Sensitivity in Teaching

Peer reviewed

Direct link

Maxwell, Bruce; Boon, Helen; Tanchuk, Nicolas; Rauwerda, Bryan – Journal of Moral Education, 2021

This article documents the adaptation, piloting and validation of a measure of teachers' ethical sensitivity. To create the test, we modified a measure from dentistry drawing on literature in teacher professional ethics and drew on the expertise of professional ethics scholars and practitioners. Based on the results of Rasch analysis combined with…

Descriptors: Ethics, Moral Values, Scores, Teacher Education Programs

Validity and Reliability of the Arabic Token Test for Children

Peer reviewed

Direct link

Alkhamra, Rana A.; Al-Jazi, Aya B. – International Journal of Language & Communication Disorders, 2016

Background: The Token Test for Children (2nd edition) (TTFC) is a measure for assessing receptive language. In this study we describe the translation process, validity and reliability of the Arabic Token Test for Children (A-TTFC). Aims: The aim of this study is to translate, validate and establish the reliability of the Arabic Token Test for…

Descriptors: Receptive Language, Tests, Children, Test Validity

Validation of a Short Form of an Indecision Test: The Vocational Assessment Test

Peer reviewed

Direct link

Picard, France; Frenette, Éric; Guay, Frédéric; Labrosse, Julie – International Journal for Educational and Vocational Guidance, 2015

The purpose of this research was to validate the scores of a short form of a new instrument, "l'Épreuve de décision vocationnelle, forme scolaire" (EDV-9S; vocational assessment test), which measures six indecision-related problems (lack of self-knowledge, lack of readiness, lack of method in decision making, lack of information,…

Descriptors: Test Validity, Career Choice, Test Reliability, Item Response Theory

Evidence for the Reliability of Coursework

Direct link

Benton, Tom – Cambridge Assessment, 2016

The reliability of an assessment is defined as the extent to which candidates' results would remain stable if the entire assessment exercise was repeated. Whilst numerous studies have evaluated the reliability of written examinations, relatively little has been done to quantify the reliability of internal teacher assessment within schools. This is…

Descriptors: Test Reliability, Foreign Countries, History Instruction, English Literature

Previous Page | Next Page »

Pages: 1 | 2 | 3

Journal of Educational…	3
Educational and Psychological…	2
ProQuest LLC	2
Advances in Health Sciences…	1
Annenberg Institute for…	1
British Journal of Guidance &…	1
Cambridge Assessment	1
Caribbean Journal of Education	1
Center on Standards and…	1
Computer Science Education	1
Educational Measurement:…	1
Electronic Journal of…	1
International Association for…	1
International Journal for…	1
International Journal of…	1
International Journal of…	1
Journal of Applied Research…	1
Journal of Educational…	1
Journal of Further and Higher…	1
Journal of Moral Education	1
Journal of Teaching in…	1
National Center for Education…	1
Practical Assessment,…	1
Psychology Learning and…	1
Statistics Education Research…	1
More ▼

Abad, Francisco J.	1
Admiraal, Wilfried	1
Al-Jazi, Aya B.	1
Alkhamra, Rana A.	1
Amery D. Wu	1
Benjamin W. Domingue	1
Benton, Tom	1
Bergquist, Constance	1
Bers, Marina U.	1
Blaker, Lisa	1
Boon, Helen	1
Cakmak, Sedanur	1
Chan, James Y.	1
Chance, Beth	1
Chiu, Yu-Ting	1
Christopher M. Claude	1
Cohen, Allan S., Comp.	1
Dodd, Karen	1
Evans, Franklin R.	1
Fleenor, John W.	1
Frenette, Éric	1
Garfield, Joan	1
Graham, Darol L.	1
Guay, Frédéric	1
Harrington, Michael	1
More ▼