Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 6 |
| Since 2017 (last 10 years) | 13 |
| Since 2007 (last 20 years) | 27 |
Descriptor
| Scores | 38 |
| Test Reliability | 38 |
| Tests | 38 |
| Test Validity | 20 |
| Correlation | 14 |
| Foreign Countries | 12 |
| Evaluation Methods | 10 |
| Factor Analysis | 9 |
| Item Response Theory | 9 |
| Test Construction | 9 |
| Item Analysis | 7 |
| More ▼ | |
Source
Author
| Abad, Francisco J. | 1 |
| Admiraal, Wilfried | 1 |
| Al-Jazi, Aya B. | 1 |
| Alkhamra, Rana A. | 1 |
| Amery D. Wu | 1 |
| Benjamin W. Domingue | 1 |
| Benton, Tom | 1 |
| Bergquist, Constance | 1 |
| Bers, Marina U. | 1 |
| Blaker, Lisa | 1 |
| Boon, Helen | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 13 |
| Postsecondary Education | 11 |
| Elementary Education | 2 |
| Secondary Education | 2 |
| Elementary Secondary Education | 1 |
| High Schools | 1 |
| Kindergarten | 1 |
Audience
| Administrators | 1 |
| Practitioners | 1 |
Location
| United Kingdom | 3 |
| United Kingdom (England) | 3 |
| Australia | 2 |
| Germany | 2 |
| Netherlands | 2 |
| Taiwan | 2 |
| Turkey | 2 |
| Asia | 1 |
| Brazil | 1 |
| Colombia | 1 |
| Connecticut | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Elementary and Secondary… | 1 |
Assessments and Surveys
| Early Childhood Longitudinal… | 1 |
| Law School Admission Test | 1 |
What Works Clearinghouse Rating
Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…
Descriptors: Value Added Models, Tests, Testing, Scoring
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Christopher M. Claude – ProQuest LLC, 2024
This dissertation comprises three complementary studies that aim to advance the understanding and practice of Individualized Education Programs (IEP) and Present Levels of Academic Achievement and Functional Performance (PLAAFP) development in special education. In the first study, we systematically reviewed empirical research measuring IEP…
Descriptors: Individualized Education Programs, Academic Achievement, Special Education, Measurement
Nicewander, W. Alan – Educational and Psychological Measurement, 2019
This inquiry is focused on three indicators of the precision of measurement--conditional on fixed values of ?, the latent variable of item response theory (IRT). The indicators that are compared are (1) The traditional, conditional standard errors, s(eX|?) = CSEM; (2) the IRT-based conditional standard errors, s[subscript irt](eX|?)=C[subscript…
Descriptors: Measurement, Accuracy, Scores, Error of Measurement
Center on Standards and Assessments Implementation, 2018
Reliability is a measure of consistency. It is the degree to which student results are the same when they take the same test on different occasions, when different scorers score the same item or task, and when different but equivalent tests are taken at the same time or at different times. Reliability is about making sure that different test forms…
Descriptors: Test Reliability, Test Validity, Student Evaluation, Test Bias
Maghfiroh, Anissa; Kuswanto, Heru – International Journal of Instruction, 2022
This research aims to reveal the effectiveness of the use of Kofie GeBoL media in improving (1) vector representation ability and (2) critical thinking ability in physics instruction. It is a descriptive quantitative study with the quasi-experiment design. It was conducted in two stages: empirical try out and implementation of Kofie GeboL to see…
Descriptors: Physics, Instructional Effectiveness, Critical Thinking, Thinking Skills
Rubright, Jonathan D. – Educational Measurement: Issues and Practice, 2018
Performance assessments, scenario-based tasks, and other groups of items carry a risk of violating the local item independence assumption made by unidimensional item response theory (IRT) models. Previous studies have identified negative impacts of ignoring such violations, most notably inflated reliability estimates. Still, the influence of this…
Descriptors: Performance Based Assessment, Item Response Theory, Models, Test Reliability
de Ruiter, Laura E.; Bers, Marina U. – Computer Science Education, 2022
Background and Context: Despite the increasing implementation of coding in early curricula, there are few valid and reliable assessments of coding abilities for young children. This impedes studying learning outcomes and the development and evaluation of curricula. Objective: Developing and validating a new instrument for assessing young…
Descriptors: Programming Languages, Computer Software, Coding, Computer Science Education
Trierweiler, Tammy J.; Lewis, Charles; Smith, Robert L. – Journal of Educational Measurement, 2016
In this study, we describe what factors influence the observed score correlation between an (external) anchor test and a total test. We show that the anchor to full-test observed score correlation is based on two components: the true score correlation between the anchor and total test, and the reliability of the anchor test. Findings using an…
Descriptors: Scores, Correlation, Tests, Test Reliability
Pennell, Adam – ProQuest LLC, 2019
This dissertation consists of three studies which examined multidimensional balance in youth (= 21 years; Individuals with Disabilities Education Act, 2004) with visual impairments (VIs) using the Brief-Balance Evaluation Systems Test (Brief-BESTest). These studies have the potential to inform (adapted) physical education curricula and…
Descriptors: Psychomotor Skills, Youth, Visual Impairments, Human Posture
Maxwell, Bruce; Boon, Helen; Tanchuk, Nicolas; Rauwerda, Bryan – Journal of Moral Education, 2021
This article documents the adaptation, piloting and validation of a measure of teachers' ethical sensitivity. To create the test, we modified a measure from dentistry drawing on literature in teacher professional ethics and drew on the expertise of professional ethics scholars and practitioners. Based on the results of Rasch analysis combined with…
Descriptors: Ethics, Moral Values, Scores, Teacher Education Programs
Alkhamra, Rana A.; Al-Jazi, Aya B. – International Journal of Language & Communication Disorders, 2016
Background: The Token Test for Children (2nd edition) (TTFC) is a measure for assessing receptive language. In this study we describe the translation process, validity and reliability of the Arabic Token Test for Children (A-TTFC). Aims: The aim of this study is to translate, validate and establish the reliability of the Arabic Token Test for…
Descriptors: Receptive Language, Tests, Children, Test Validity
Picard, France; Frenette, Éric; Guay, Frédéric; Labrosse, Julie – International Journal for Educational and Vocational Guidance, 2015
The purpose of this research was to validate the scores of a short form of a new instrument, "l'Épreuve de décision vocationnelle, forme scolaire" (EDV-9S; vocational assessment test), which measures six indecision-related problems (lack of self-knowledge, lack of readiness, lack of method in decision making, lack of information,…
Descriptors: Test Validity, Career Choice, Test Reliability, Item Response Theory
Benton, Tom – Cambridge Assessment, 2016
The reliability of an assessment is defined as the extent to which candidates' results would remain stable if the entire assessment exercise was repeated. Whilst numerous studies have evaluated the reliability of written examinations, relatively little has been done to quantify the reliability of internal teacher assessment within schools. This is…
Descriptors: Test Reliability, Foreign Countries, History Instruction, English Literature

Peer reviewed
Direct link
